The `import` parser has false positives and false negatives #79

pygy · 2020-04-24T21:17:11Z

As mentioned earlier, here are a few bugs I found:

import {
  //**/
  foo
} from "./foo.js"

Gives

Error resolving module specifier: ./foo.js ... at .js:4:7

Live here

If you point me to the parser (I suppose it is RegExp-based) I can upgrade it.

porsager · 2020-04-24T21:25:49Z

Ah, I see..

Yeah, they're here:

flems/src/srcdoc/index.js

Lines 14 to 17 in d3acee7

    
           const isModuleRegex = /(^\s*|[});\n]\s*)(import\s*\(?(['"]|(\*[\s\S]+as[\s\S]+)?(?!type)([^"'()\n;]+)[\s\S]*from[\s\S]*['"]|\{)|export\s\s*(['"]|(\*[\s\S]+as[\s\S]+)?(?!type)([^"'()\n;]+)[\s\S]*from[\s\S]*['"]|\{|default|function|class|var|const|let|async[\s\S]+function|async[\s\S]+\())/ 
        
               , topoSortRegex = new RegExp('import\\s*[{}$\\w*,\\s]*\\s*(?: from |)[\'"]\\.?\\/(.*\\.?[a-z]*)[\'"]', 'g') 
        
               , staticImportRegex = new RegExp('(import\\s*[{}$\\w*,\\s]*\\s*(?: from |)[\'"])([\\w@][\\w@/.-]*)([\'"])', 'g') 
        
               , dynamicImportRegex = new RegExp('(import\\([\'"])([\\w@][\\w@/.-]*)([\'"]\\))', 'g')

pygy · 2020-04-24T22:57:45Z

Ok, I'm on it... Why are you using the RegExp constructor rather than literals?

pygy · 2020-04-25T00:11:15Z

Well, if I follow the JS Spec with a hack for identifiers, I end up with this for just detecting imports:
/import(?:(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*(?:"(?:\\[^]|.)*"|'(?:\\[^]|.)*'|`(?:\\[^]|[^])*`)|(?:(?:(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*)+[^,{}\/\s]+(?:(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*,(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*(?:\*(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*as(?:(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*)+[^,{}\/\s]+(?:(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*)+|(?:\{(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*\}|\{(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*(?:(?:[^,{}\/\s]+(?:(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*)+as(?:(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*)+[^,{}\/\s]+|[^,{}\/\s]+)(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*,(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*)*(?:[^,{}\/\s]+(?:(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*)+as(?:(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*)+[^,{}\/\s]+|[^,{}\/\s]+)(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*(?:,(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*)?\})(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*)|(?:(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*)+)|\*(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*as(?:(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*)+[^,{}\/\s]+(?:(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*)+|(?:\{(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*\}|\{(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*(?:(?:[^,{}\/\s]+(?:(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*)+as(?:(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*)+[^,{}\/\s]+|[^,{}\/\s]+)(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*,(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*)*(?:[^,{}\/\s]+(?:(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*)+as(?:(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*)+[^,{}\/\s]+|[^,{}\/\s]+)(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*(?:,(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*)?\})(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*)from(?:\s|\/\/[^\n]*\n|\/\*[^]*?\*\/)*(?:"(?:\\[^]|.)*"|'(?:\\[^]|.)*'|`(?:\\[^]|[^])*`))/

... which is probably a bit much... Also, I'd need to triple check and make sure it doesn't backtrack pathologically (it probably does). There's probably a smarter way to do it...

pygy · 2024-02-04T11:18:01Z

So, I'm about to implement something similar for the online version of bunchmark, and while working on an improved version of the above, I found a case where Flems modifies code it shouldn't and adds https://unpkg.com/...?module where it shouldn't.

Here is a reduced example.

My WIP parser had variables with import in the name, and the string 'from' which triggered the bug, I've replaced the problematic names with umport for now.

The new parser should be ReDOS resistant, but I still have to test it.

I end up with these ginormous RegExps:

const searcher = /\bimport(?!\p{ID_Continue}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\1\*\/)*(?:"(?:\\.|[^"])*"|'(?:\\.|[^'])*'|(?:\{(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\2\*\/)*(?:(?:(?=((?:\p{ID_Start}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))(?:\p{ID_Continue}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))*))\3(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\4\*\/)*as(?!\p{ID_Continue}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\5\*\/)*(?=((?:\p{ID_Start}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))(?:\p{ID_Continue}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))*))\6|(?=((?:\p{ID_Start}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))(?:\p{ID_Continue}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))*))\7)(?:(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\8\*\/)*,(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\9\*\/)*(?:(?=((?:\p{ID_Start}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))(?:\p{ID_Continue}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))*))\10(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\11\*\/)*as(?!\p{ID_Continue}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\12\*\/)*(?=((?:\p{ID_Start}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))(?:\p{ID_Continue}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))*))\13|(?=((?:\p{ID_Start}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))(?:\p{ID_Continue}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))*))\14))*)?(?:(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\15\*\/)*,)?(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\16\*\/)*\}|\*(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\17\*\/)*as(?!\p{ID_Continue}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\18\*\/)*(?=((?:\p{ID_Start}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))(?:\p{ID_Continue}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))*))\19|(?=((?:\p{ID_Start}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))(?:\p{ID_Continue}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))*))\20(?:(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\21\*\/)*,(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\22\*\/)*(?:\{(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\23\*\/)*(?:(?:(?=((?:\p{ID_Start}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))(?:\p{ID_Continue}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))*))\24(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\25\*\/)*as(?!\p{ID_Continue}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\26\*\/)*(?=((?:\p{ID_Start}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))(?:\p{ID_Continue}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))*))\27|(?=((?:\p{ID_Start}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))(?:\p{ID_Continue}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))*))\28)(?:(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\29\*\/)*,(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\30\*\/)*(?:(?=((?:\p{ID_Start}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))(?:\p{ID_Continue}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))*))\31(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\32\*\/)*as(?!\p{ID_Continue}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\33\*\/)*(?=((?:\p{ID_Start}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))(?:\p{ID_Continue}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))*))\34|(?=((?:\p{ID_Start}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))(?:\p{ID_Continue}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))*))\35))*)?(?:(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\36\*\/)*,)?(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\37\*\/)*\}|\*(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\38\*\/)*as(?!\p{ID_Continue}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\39\*\/)*(?=((?:\p{ID_Start}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))(?:\p{ID_Continue}|\\u(?:[0-9A-Fa-f]{4}|\{[0-9A-Fa-f]{1,6}\}))*))\40))?)(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\41\*\/)*from(?:\s|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\42\*\/)*(?<moduleSpecifier>"(?:\\.|[^"])*"|'(?:\\.|[^'])*'))|\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\44\*\/|"(?:\\.|[^"])*"|'(?:\\.|[^'])*'|`(?=((?:\\[^]|(?!`|\$\{)[^])*))\45`|`(?=((?:\\[^]|(?!`|\$\{)[^])*))\46\$\{(?<openTemplate>)/dgu

... and then this const searcher2 = /\/\/.*[\n\r]|\/\*(?=((?:(?!\*\/)[^])*))\1\*\/|"(?:\\.|[^"])*"|'(?:\\.|[^'])*'|`(?=((?:\\[^]|(?!`|\$\{)[^])*))\2`|`(?=((?:\\[^]|(?!`|\$\{)[^])*))\3\$\{(?<openTemplate>)|\}(?=((?:\\[^]|(?!`|\$\{)[^])*))\5\$\{|\}(?=((?:\\[^]|(?!`|\$\{)[^])*))\6`(?<closeTemplate>)/dgu for skipping template literals with embedded expressions.

Edit: adding support for dynamic imports as well.

pygy · 2024-02-04T13:20:00Z

The package is live as [email protected].

https://github.com/pygy/unpkgify (it has tests and seems quite robust).

pygy · 2024-02-04T17:26:15Z

My approach still has regexp-related corner cases (you need a full JS parser to tell them apart from successive divisions).

If a RegExp contains a pattern that looks like a bit of template string, or even simply a brace, it can throw my parser off. I think I'm going to use Babel for this.

porsager added the bug label Mar 29, 2022

pygy changed the title ~~Comments thwart the import parser~~ The import parser has false positives and false negatives Feb 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The `import` parser has false positives and false negatives #79

The `import` parser has false positives and false negatives #79

pygy commented Apr 24, 2020

porsager commented Apr 24, 2020

pygy commented Apr 24, 2020

pygy commented Apr 25, 2020 •

edited

Loading

pygy commented Feb 4, 2024 •

edited

Loading

pygy commented Feb 4, 2024 •

edited

Loading

pygy commented Feb 4, 2024

The import parser has false positives and false negatives #79

The import parser has false positives and false negatives #79

Comments

pygy commented Apr 24, 2020

porsager commented Apr 24, 2020

pygy commented Apr 24, 2020

pygy commented Apr 25, 2020 • edited Loading

pygy commented Feb 4, 2024 • edited Loading

pygy commented Feb 4, 2024 • edited Loading

pygy commented Feb 4, 2024

The `import` parser has false positives and false negatives #79

The `import` parser has false positives and false negatives #79

pygy commented Apr 25, 2020 •

edited

Loading

pygy commented Feb 4, 2024 •

edited

Loading

pygy commented Feb 4, 2024 •

edited

Loading