Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat apilinks.json generator #153

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

feat apilinks.json generator #153

wants to merge 1 commit into from

Conversation

flakey5
Copy link
Member

@flakey5 flakey5 commented Nov 29, 2024

Closes #152

Opening this as a draft currently to get feedback on the approach since it's a bit non-trivial relative to the other generators. The apilinks.json file maps things exported by modules to their source locations on Github.

Example:

{
  "SomeClass.prototype.var": "github.com/nodejs/node/tree/<hash>/lib/file.js#L100"
}

This means we need to parse the module's javascript source in addition to its markdown.

What the current approach does is doing:

  • Adds another loading & parsing step for the js source files
    • acorn is used for parsing the source files
    • This is dependent on the Markdown source parsing since it uses the source_link metadata in the docs
  • Exposes the parsed js ast to other generators by adding the ast-js generator
  • api-links generator is based off of the ast-js result

With the current approach, the generator is almost there. Some todos remain though:

  • I think some exported functions aren't making it
  • Git methods are kinda broken
  • Some cleanup & docs
  • Add tests (when we agree on an approach)

Closes #152

Signed-off-by: flakey5 <[email protected]>
@flakey5 flakey5 requested a review from a team as a code owner November 29, 2024 07:59
@flakey5 flakey5 marked this pull request as draft November 29, 2024 07:59
*/
export function getGitRepository(directory) {
try {
const trackingRemote = execSync(`cd ${directory} && git remote`);

Check warning

Code scanning / CodeQL

Unsafe shell command constructed from library input Medium

This string concatenation which depends on
library input
is later used in a
shell command
.

Copilot Autofix AI 24 days ago

To fix the problem, we should avoid using execSync with unsanitized input. Instead, we can use execFileSync from the child_process module, which allows us to pass arguments as an array, avoiding shell interpretation. This change ensures that the input is treated as a literal string and not as a part of the shell command.

  1. Replace execSync with execFileSync in the getGitRepository and getGitTag functions.
  2. Modify the commands to use execFileSync with arguments passed as an array.
Suggested changeset 1
src/utils/git.mjs

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/utils/git.mjs b/src/utils/git.mjs
--- a/src/utils/git.mjs
+++ b/src/utils/git.mjs
@@ -2,3 +2,3 @@
 
-import { execSync } from 'child_process';
+import { execFileSync } from 'child_process';
 
@@ -14,6 +14,4 @@
   try {
-    const trackingRemote = execSync(`cd ${directory} && git remote`);
-    const remoteUrl = execSync(
-      `cd ${directory} && git remote get-url ${trackingRemote}`
-    );
+    const trackingRemote = execFileSync('git', ['-C', directory, 'remote']).toString().trim();
+    const remoteUrl = execFileSync('git', ['-C', directory, 'remote', 'get-url', trackingRemote]).toString().trim();
 
@@ -40,7 +38,5 @@
     const hash =
-      execSync(`cd ${directory} && git log -1 --pretty=%H`) || 'main';
+      execFileSync('git', ['-C', directory, 'log', '-1', '--pretty=%H']).toString().trim() || 'main';
     const tag =
-      execSync(`cd ${directory} && git describe --contains ${hash}`).split(
-        '\n'
-      )[0] || hash;
+      execFileSync('git', ['-C', directory, 'describe', '--contains', hash]).toString().split('\n')[0] || hash;
 
EOF
@@ -2,3 +2,3 @@

import { execSync } from 'child_process';
import { execFileSync } from 'child_process';

@@ -14,6 +14,4 @@
try {
const trackingRemote = execSync(`cd ${directory} && git remote`);
const remoteUrl = execSync(
`cd ${directory} && git remote get-url ${trackingRemote}`
);
const trackingRemote = execFileSync('git', ['-C', directory, 'remote']).toString().trim();
const remoteUrl = execFileSync('git', ['-C', directory, 'remote', 'get-url', trackingRemote]).toString().trim();

@@ -40,7 +38,5 @@
const hash =
execSync(`cd ${directory} && git log -1 --pretty=%H`) || 'main';
execFileSync('git', ['-C', directory, 'log', '-1', '--pretty=%H']).toString().trim() || 'main';
const tag =
execSync(`cd ${directory} && git describe --contains ${hash}`).split(
'\n'
)[0] || hash;
execFileSync('git', ['-C', directory, 'describe', '--contains', hash]).toString().split('\n')[0] || hash;

Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
try {
const trackingRemote = execSync(`cd ${directory} && git remote`);
const remoteUrl = execSync(
`cd ${directory} && git remote get-url ${trackingRemote}`

Check warning

Code scanning / CodeQL

Unsafe shell command constructed from library input Medium

This string concatenation which depends on
library input
is later used in a
shell command
.
export function getGitTag(directory) {
try {
const hash =
execSync(`cd ${directory} && git log -1 --pretty=%H`) || 'main';

Check warning

Code scanning / CodeQL

Unsafe shell command constructed from library input Medium

This string concatenation which depends on
library input
is later used in a
shell command
.
const hash =
execSync(`cd ${directory} && git log -1 --pretty=%H`) || 'main';
const tag =
execSync(`cd ${directory} && git describe --contains ${hash}`).split(

Check warning

Code scanning / CodeQL

Unsafe shell command constructed from library input Medium

This string concatenation which depends on
library input
is later used in a
shell command
.
@@ -0,0 +1,52 @@
'use strict';

import { execSync } from 'child_process';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import { execSync } from 'child_process';
import { execSync } from 'node:child_process';

@@ -1,10 +1,12 @@
// @ts-check
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// @ts-check

use actualy didn't use type-checking

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(?)

@ovflowd
Copy link
Member

ovflowd commented Dec 19, 2024

@flakey5 do you need review here? I think I forgot to review this PR!

@flakey5
Copy link
Member Author

flakey5 commented Dec 20, 2024

do you need review here

As far as the approach yes please

.filter(path => path !== undefined && path.endsWith('.js'))
);

const parsedJsFiles = await parseJsSources(sourceFiles);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this process needs to be blocking? Do we need to parse all these sources beforehand and not on-demand?

const sourceFiles = loadJsFiles(
parsedApiDocs
.map(apiDoc => apiDoc.source_link_local)
.filter(path => path !== undefined && path.endsWith('.js'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.filter(path => path !== undefined && path.endsWith('.js'))
.filter(path => path?.endsWith('.js'))

const { runGenerators } = createGenerator(parsedApiDocs);
const sourceFiles = loadJsFiles(
parsedApiDocs
.map(apiDoc => apiDoc.source_link_local)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.map(apiDoc => apiDoc.source_link_local)
.map(({ source_link_local }) => source_link_local)

*/
const createGenerator = input => {
const createGenerator = (input, parsedJsFiles) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const createGenerator = (input, parsedJsFiles) => {
const createGenerator = (markdownInput, jsInput) => {

const cachedGenerators = { ast: Promise.resolve(input) };
const cachedGenerators = {
ast: Promise.resolve(input),
'ast-js': Promise.resolve(parsedJsFiles),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice


findDefinitions(program, programBasename, nameToLineNumberMap, exports);

if (!baseGithubLink) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is resolved only once. Since it seems that it grabs on the first value, could you do:

const [first, ...programs] = input;

const directory = dirname(first.path);
const repository = getGitRepository(directory);
const tag = getGitTag(directory);
const baseGithubLink = `https://github.com/${repository}/blob/${tag}`;

(Do this outside of the foreach), also this should probably be its own function and unit tested, ie: getGitHubLinkFromProgram

Also, it is GitHub, not Github HAHAHA

@@ -0,0 +1,215 @@
// @ts-check
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this comment?

*/
function extractExpression(expression, loc, basename, nameToLineNumberMap) {
/**
* @example `a=b`, lhs=`a` and rhs=`b`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make these inline with just //


exports.identifiers.push(value.name);

if (/^[A-Z]/.test(value.name[0])) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain the Regex and store it in a constant


switch (statement.type) {
case 'ExpressionStatement': {
const { expression } = statement;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to deconstruct this?


break;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how common it is to have empty lines between cases. I think it would be valid to have an inline doc

// @see reference to what `VariableDeclaration` is

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how expensive these functions are, as it seems that this can quickly get bloated depending on the size of source files.

BTW, could you use https://unifiedjs.com/explore/package/estree-util-visit/ to walk/visit the nodes? This is what I was referring to, instead of plain forEaches and while loops.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it might be worth to move "estree" moving logic to a dedicated "estree parser" IDK

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have a parser for the Markdown files, having one just for Estree makes sense I guess.

});
};

return { loadFiles, loadJsFiles };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename loadFiles to loadMarkdownFiles

return filePaths.map(async filePath => {
const fileContents = await readFile(filePath, 'utf-8');

return new VFile({ path: filePath, value: fileContents });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit concerned of the memory allocation that this will cause, some of the source files can be humungous. Have you made some comparisons of heap size differences?

@@ -182,7 +184,55 @@ const createParser = () => {
return resolvedApiDocEntries.flat();
};

return { parseApiDocs, parseApiDoc };
/**
* TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why TODO? What's missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about some of these executions here; they can be downright unsafe. Can we use maybe an existing NPM package that allows us to run git apis from JS? Chances we don't even need to run a git command (exec) since technically .git folder has all the data we need.

Copy link
Member

@ovflowd ovflowd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach in general sounds good; Just, some of these utilities are way too complex and big and the git manipulation scripts are possibly dangerous or at least an easy vulnerability hell.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

urgent: feat: add apilinks.json generator
3 participants