-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Repology Updater #54
base: main
Are you sure you want to change the base?
Conversation
What it will do: Repository Level: 1. Maintain a list of repology repository prefixes that are relevant 2. Generate a list of source package directory URLs from (1) 3. Fetch the list of packages and keep it locally. Package Level: 1. Fetch the list of repology identifiers for a product 2. These are used to fetch the relevant repology project 3. For each product, filter the list of packages by a list of repository prefixes 4. For each such repository, use the list of packages generated above and use it to generate a comprehensive list of PURLs 5. Finally, save the list of PURLs to disk The final version should deliver a clear and comprehensive list of PURLs for a given product, where each PURL represents the latest version of a package available on a specific distribution channel (not necessarily linux distro). These PURLs can then be used to augment scan results, by generating feeds for scanning products. The usecase could be: 1. Use type/namespace/name to check if product is in our database 2. Use the version against our list from above to see if it is the latest version available on that channel. Give warning if not. 3. If it is the latest version, check to see if the latest version is considered supported. Additionally, use the channel's support status as well (such as debian support dates, repository information) to provide clear guarantee of support. Depending on results from 1,2,3: return a vulnerability rating. Most of the scanning part can perhaps be done by existing scanners, so we are looking to bootstrap this by generating a "feed" instead. Feed Details: 1. A vulnerability feed typically contains information about known vulnerabilities in various products, using package name, channel, and version ranges. 2. We can generate such a feed from our PURLs and EOL API. Each unsupported release cycle can be used to craft a "pseudo"-vulnerability that triggers on unsupported versions being detected. 3. The feed will need a lot of exceptions for supported packages on various channels, which is why we need to do repology scraping
Found out that this was a lot more work than I'd expected, due to my flawed understanding of what all repology tracked. Repology tracks source-packages, where it can, to reduce effort and make tracking easier. This works, since repology is more interested in tracking "what version of a package is available in a repository" rather than "all the various ways this package can be installed". We're interested in the latter (we want a SBOM -> package -> PURL -> product lookup). But for that, we need an exhaustive list of all packages that are built from a source-package. This happens in many cases, but most prominently in the case of debian and rpm based distros. For eg, https://repology.org/api/v1/project/zookeeper has a single entry for debian bookworm. That entry links it to the That itself gets built into 10 separate binary packages, which are all those we actually want to track. It is in generating this mapping that I'm working on currently - this involves parsing the package files across all distros, and took some effort. Got it working for DEB distros. |
Doing some investigation into MongoDB as an example. https://repology.org/api/v1/project/mongodb For Ubuntu, the packages installed are from repo.mongodb.org
But repology has no knowledge of this package existing in this repo. Would resolving this be as simple as adding a new repository to repology and then finding the binaries installed from the repo package? |
Since this is a small list, we could easily add static PURLs for all of these. We could scan the repo as well, but that only makes sense for larger significant repositories. |
@captn3m0 do you have WIP commits on this branch you could push? Might be able to work in parallel here. I can tackle searching packages in other distros. |
What it will do:
Repository Level (This is entirely TODO)
Package Level (Some of this is done)
The final version should deliver a clear and comprehensive list of PURLs for a given product, where each PURL represents the latest version of a package available on a specific distribution channel (not necessarily linux distro).
These PURLs can then be used to augment scan results, by generating feeds for scanning products. The usecase could be:
Depending on results from 1,2,3: return a vulnerability rating. Most of the scanning part can perhaps be done by existing scanners, so we are looking to bootstrap this by generating a "feed" instead.
Feed Details: