Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add npm name validation and make lower casing names adaptive #80

Merged
merged 1 commit into from
Dec 5, 2024

Conversation

jdalton
Copy link
Collaborator

@jdalton jdalton commented Nov 15, 2024

Fix for #79.

@matt-phylum
Copy link

This fixes the part where old names are incorrectly lowercased, but in a very complicated way. The PURL spec is wrong here and says to just always lowercase so if this library is going to fix the problem ahead of the spec, like it does for Go, it can't match the spec. However, I don't think this is really right and I don't think it's something that would ever be included in the PURL spec.

NPM package names are case sensitive. @angular/cli and @angular/CLI (does not and will not exist) are not the same package. This new implementation contains code specifically to make them the same PURL without making jquery and jQuery the same. I think it'd be better to just remove this code. I don't think the PURL spec will be updated to say "the name is case sensitive unless there is a namespace in which case it is case insensitive and must be lowercased."

PURL does not normally validate the package name and version, and I think this is a good thing. It's complicated and error prone to incorporate into the PURL spec rules like "the name must not be case-insensitively equal to favicon.ico" or "if there is a namespace then the lengths of the namespace and name together must be no more than 213 characters" and have those rules consistently implemented across PURL libraries. If the ecosystem ever changes the rules, updating the spec and rolling out those changes to all PURL implementations can be a slow process. I guess if the validation was opt-in it wouldn't be so bad, but I'm worried that extensive validation logic in this library beyond what is documented in the PURL spec will create interoperability problems where a valid PURL will be rejected by an application using this library because this library thinks the PURL can never refer to a real package.

@jdalton
Copy link
Collaborator Author

jdalton commented Nov 15, 2024

@matt-phylum I hear you.

The spec does mandate validation of several type's namespaces, names, and versions with the "MUST" and "MUST NOT" language. Some are rather complex and in realms I'm not familiar with which is why I've "TODO"'d them. For example, the cpan purl type spec does toggle behavior when there is or is not a namespace using "MUST" and "MUST NOT" language.

The npm validation logic is rather solid and hasn't changed for nearly 10 years. I think it's fine to incorporate some validation for their packages. The JS implementation is of the npm ecosystem so having extra attention for it seems reasonable.

Toggling on namespace is a way to have our cake and eat it too at the cost of some complexity.
There are 2,802 legacy mixed-case packages with 742 of them overlapping. Instead of toggling on namespace I could just reference the list of 2,802 names.

Updated: Now uses pre-computed list of names to make the exception for else it behaves as spec'd. The code is written in a way that is browser bundle friendly and keeps the data loaded to a minimum.

@jdalton jdalton force-pushed the jdalton/fix-npm branch 3 times, most recently from 56af5fe to 55b9df1 Compare November 15, 2024 21:56
Copy link
Collaborator

@steven-esser steven-esser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, pending other's approval of spec-conformation.

@jdalton
Copy link
Collaborator Author

jdalton commented Dec 2, 2024

@steven-esser at Socket when using a variation of this the new validation uncovered non-compliant names. We have a concept of an unknown type. It may be beneficial to have an option for not throwing and instead returning an unknown type purl.

We have an opportunity to revise with a major version bump the constructor to provide an options object. The ergonomics of the args passed to the constructor also urk me a bit as it currently goes new PackageURL(requiredType, optionalNamespace, requiredName, optionalVersion, optionalQualifiers, optionalSubpath). I think the required params should be first and the rest potential options.

@steven-esser
Copy link
Collaborator

@jdalton I have no problem with supporting that feature, it's almost a necessity to be more flexible on the parsing side when using this in the wild. An option would save the couple lines of exception handling.

Would support the change to options object as well. Might be an annoyance to some to have to fix their tests/builds, but it's incredibly ugly and non-conforming to the majority of JS/TS packages.

@matt-phylum
Copy link

Is an unknown type useful? If pkg:npm/ALL_CAPS parses as pkg:unknown/ALL_CAPS that seems like it would cause more problems than it solves.

The way optional package type validation is handled for the Rust purl crate (not to be confused with the packageurl crate), is that the package type is a generic type parameter and you can use either GenericPurl<PackageType> which applies package type rules like normalization and validation, or GenericPurl<String> which has no knowledge of package types and can represent any PURL of any package type without applying any package type rules (so the PURL is syntactically valid, but may not be correct for the specified package type). This also means you can use GenericPurl<MyPackageType> to apply different rules than the ones the library comes with. Javascript doesn't have generic type parameters but you could probably get something similar to const MyPackageURL = PackageURL.withTypeProvider(myTypeProvider); const myPurl = new MyPackageURL(…); if you wanted to use the same pattern.

@jdalton
Copy link
Collaborator Author

jdalton commented Dec 5, 2024

@matt-phylum
An unknown would be something like pkg:unknown/unknown as name and type are required. Folks then filter out .type === 'unknown if they like. Thanks for the info on how Rust purl does it!

@steven-esser
I think this could merge as is and I can follow-up with a options PR.
This will end up being a major bump v3 🎉

@matt-phylum
Copy link

I don't understand what pkg:unknown/unknown is good for. I was thinking you'd do pkg:unknown/whatever-invalid-name@whatever-invalid-version?whatever-invalid-qualifier=whatever-invalid-value so that the code knows the PURL is invalid but can still interrogate it, but then the package type would be lost and it would need to be recovered somehow if it were required. If the returned value is equivalent to pkg:unknown/unknown, then the caller receives less information than if an exception were thrown, and now the caller always needs to check for pkg:unknown/unknown or they're going to get unexpected results like pkg:npm/CAPS -> pkg:unknown/unknown from this library and then either they use an invalid PURL or they get an unexpected error later when some other implementation rejects pkg:unknown/unknown for having an unsupported package type. If the idea is to not throw an exception, it'd probably be better to return null. If the idea is to allow the caller to use invalid PURLs, it'd probably be better to allow disabling validation. If the idea is to allow the caller to see what information was parsed, it'd probably be better to add that information to the exception.

@jdalton
Copy link
Collaborator Author

jdalton commented Dec 5, 2024

@matt-phylum Sometimes you only want to handle valid purls in those cases the pkg:unknown/unknown is a substitution that is easily detectable and filterable. If you want more information then you'd likely try-catch and error and log to taste. Returning null is less great as now you have to do purl?.type checks for values. Returning an unknown but otherwise touchable and intractable purl allows for easy detection and filtering without null gotchas.

@jdalton jdalton merged commit d843a59 into master Dec 5, 2024
1 check passed
@jdalton jdalton deleted the jdalton/fix-npm branch December 5, 2024 21:55
@matt-phylum
Copy link

From experience, I expect that users won't detect or filter invalid PURLs until they've started seeing bugs related to unexpected pkg:unknown/unknown values.

@jdalton
Copy link
Collaborator Author

jdalton commented Dec 6, 2024

@matt-phylum That's fine. That's why it's an option and not the default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants