Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DependencyTrack hangs when uploading a large SBOM to a project a second time #1905

Closed
JayAtFujifilm opened this issue Aug 23, 2022 · 22 comments · Fixed by #3357
Closed

DependencyTrack hangs when uploading a large SBOM to a project a second time #1905

JayAtFujifilm opened this issue Aug 23, 2022 · 22 comments · Fixed by #3357
Assignees
Labels
defect Something isn't working pending release
Milestone

Comments

@JayAtFujifilm
Copy link

JayAtFujifilm commented Aug 23, 2022

Current Behavior:

When we create a new project and upload a large SBOM (~7.5MB) there is no problem. However, if we then upload the SBOM again to the same project, the DepTrack API server hangs and never finishes processing. Inspection of the logfile via Docker indicates a stack overflow error (java.lang.StackOverflowError), as shown in the attached logfile (DependencyTrackLog.txt).

This happens even if we wait a long time (several days) to upload the second SBOM.

For smaller SBOMs we never have this problem.

Steps to Reproduce:

Using the DependencyTrack API:

  1. Create a new project
  2. Upload a large SBOM (~7.5MB)
  3. Verify that processing completes and the analysis is shown in the DepTrack UI.
  4. Upload the SBOM once again to the previously created project.
  5. DepTrack never finishes processing.

Expected Behavior:

Should be able to upload even large SBOMs multiple times to the same project.

Environment:

  • Dependency-Track Version: 4.5.0
  • Distribution: Docker
  • BOM Format & Version: CycloneDX 1.4 in JSON
  • Database Server: PostgreSQL
  • Browser: Various

Additional Details:

Unfortunately, we cannot send you the actual SBOM for proprietary reasons.

@nscuro
Copy link
Member

nscuro commented Aug 23, 2022

Wow, that's certainly a first. Ever seen anything like this happening before, @stevespringett?

2022-08-08 11:20:15,312 ERROR [LoggableUncaughtExceptionHandler] An unknown error occurred in an asynchronous event or notification thread
java.lang.StackOverflowError: null
	at java.base/java.security.AccessController.doPrivileged(Native Method)
	at org.datanucleus.state.StateManagerImpl.replaceStateManager(StateManagerImpl.java:2096)
	at org.datanucleus.state.StateManagerImpl.initialiseForDetached(StateManagerImpl.java:644)
	at org.datanucleus.state.StateManagerImpl.initialiseForDetached(StateManagerImpl.java:126)
	at org.datanucleus.state.StateManagerImpl.detachCopy(StateManagerImpl.java:4932)
	at org.datanucleus.state.StateManagerImpl.detachCopy(StateManagerImpl.java:126)
	at org.datanucleus.ExecutionContextImpl.detachObjectCopy(ExecutionContextImpl.java:2741)
	at org.datanucleus.store.fieldmanager.DetachFieldManager.processPersistableCopy(DetachFieldManager.java:76)
	at org.datanucleus.store.fieldmanager.DetachFieldManager.processField(DetachFieldManager.java:154)
	at org.datanucleus.store.fieldmanager.DetachFieldManager.internalFetchObjectField(DetachFieldManager.java:121)
	at org.datanucleus.store.fieldmanager.AbstractFetchDepthFieldManager.fetchObjectField(AbstractFetchDepthFieldManager.java:105)
	at org.datanucleus.state.StateManagerImpl.replacingObjectField(StateManagerImpl.java:1995)
	at org.dependencytrack.model.Component.dnReplaceField(Component.java)
	at org.dependencytrack.model.Component.dnReplaceFields(Component.java)
	at org.datanucleus.state.StateManagerImpl.replaceFields(StateManagerImpl.java:4320)
	at org.datanucleus.state.StateManagerImpl.replaceFields(StateManagerImpl.java:4345)
	at org.datanucleus.state.StateManagerImpl.detachCopy(StateManagerImpl.java:4941)
	at org.datanucleus.state.StateManagerImpl.detachCopy(StateManagerImpl.java:126)
	at org.datanucleus.ExecutionContextImpl.detachObjectCopy(ExecutionContextImpl.java:2741)
	at org.datanucleus.api.jdo.JDOPersistenceManager.jdoDetachCopy(JDOPersistenceManager.java:1121)
	at org.datanucleus.api.jdo.JDOPersistenceManager.detachCopy(JDOPersistenceManager.java:1150)
	at org.dependencytrack.persistence.ComponentQueryManager.createComponent(ComponentQueryManager.java:321)
	at org.dependencytrack.persistence.QueryManager.createComponent(QueryManager.java:452)
	at org.dependencytrack.tasks.BomUploadProcessingTask.processComponent(BomUploadProcessingTask.java:171)
	at org.dependencytrack.tasks.BomUploadProcessingTask.processComponent(BomUploadProcessingTask.java:178)
	at org.dependencytrack.tasks.BomUploadProcessingTask.processComponent(BomUploadProcessingTask.java:178)
	at org.dependencytrack.tasks.BomUploadProcessingTask.processComponent(BomUploadProcessingTask.java:178)
	at org.dependencytrack.tasks.BomUploadProcessingTask.processComponent(BomUploadProcessingTask.java:178)
	at org.dependencytrack.tasks.BomUploadProcessingTask.processComponent(BomUploadProcessingTask.java:178)
	at org.dependencytrack.tasks.BomUploadProcessingTask.processComponent(BomUploadProcessingTask.java:178)
	at org.dependencytrack.tasks.BomUploadProcessingTask.processComponent(BomUploadProcessingTask.java:178)
	at org.dependencytrack.tasks.BomUploadProcessingTask.processComponent(BomUploadProcessingTask.java:178)
	at org.dependencytrack.tasks.BomUploadProcessingTask.processComponent(BomUploadProcessingTask.java:178)
	at org.dependencytrack.tasks.BomUploadProcessingTask.processComponent(BomUploadProcessingTask.java:178)

@JayAtFujifilm, would it be possible to provide the SBOM to us so we can try to reproduce the issue?

@stevespringett
Copy link
Member

7.5MB isn't that large. The BOM I use to perform all my performance testing with is 22MB and contains just over 9K components. See attached Bloated BOMs.zip.

I think there's something else going on here, either memory or host configuration, or perhaps something in the BOM itself that contains an unexpected large amount of data in a field.

@syalioune
Copy link
Contributor

Hello Guys,

Looking at the code at line BomUploadProcessingTask.java:178 below, it is most likely a recursion issue (consistent with the stackoverflow error) due to a deep parent-child component hierarchy :

private void processComponent(final QueryManager qm, final Bom bom, Component component,
                                  final List<Component> flattenedComponents) {
        component.setInternal(InternalComponentIdentificationUtil.isInternalComponent(component, qm));
        ....
        if (component.getChildren() != null) {
            for (final Component child : component.getChildren()) {
                processComponent(qm, bom, child, flattenedComponents); <-- Line #178
            }
        }
    }

Would be interesting to know the maximum parent child relation depth in the problematic SBOM.
PS : Default stack size on OpenJDK 17 is 1024k which is already quite big.

@syalioune
Copy link
Contributor

Yup,
Just replicated with attached sbom with a depth of 14 recursive childs for component bcprov-jdk15on bom.txt.
It's a bit strange but first upload went ok, the issue appear at second upload.

dependency-track-dtrack-apiserver-1  | 2022-08-23 21:02:17,219 ERROR [LoggableUncaughtExceptionHandler] An unknown error occurred in an asynchronous event or notification thread
dependency-track-dtrack-apiserver-1  | java.lang.StackOverflowError: null
dependency-track-dtrack-apiserver-1  |  at java.base/java.security.AccessController.doPrivileged(Native Method)
dependency-track-dtrack-apiserver-1  |  at
 org.datanucleus.state.StateManagerImpl.replaceStateManager(StateManagerImpl.java:2096)
dependency-track-dtrack-apiserver-1  |  at org.datanucleus.state.StateManagerImpl.initialiseForDetached(StateManagerImpl.java:644)
dependency-track-dtrack-apiserver-1  |  at org.datanucleus.state.StateManagerImpl.initialiseForDetached(StateManagerImpl.java:126)
dependency-track-dtrack-apiserver-1  |  at org.datanucleus.state.StateManagerImpl.detachCopy(StateManagerImpl.java:4932)
dependency-track-dtrack-apiserver-1  |  at org.datanucleus.state.StateManagerImpl.detachCopy(StateManagerImpl.java:126)
dependency-track-dtrack-apiserver-1  |  at org.datanucleus.ExecutionContextImpl.detachObjectCopy(ExecutionContextImpl.java:2741)
dependency-track-dtrack-apiserver-1  |  at org.datanucleus.store.fieldmanager.DetachFieldManager.processPersistableCopy(DetachFieldManager.java:76)
dependency-track-dtrack-apiserver-1  |  at org.datanucleus.store.fieldmanager.DetachFieldManager.processField(DetachFieldManager.java:154)
dependency-track-dtrack-apiserver-1  |  at org.datanucleus.store.fieldmanager.DetachFieldManager.internalFetchObjectField(DetachFieldManager.java:121)
dependency-track-dtrack-apiserver-1  |  at org.datanucleus.store.fieldmanager.AbstractFetchDepthFieldManager.fetchObjectField(AbstractFetchDepthFieldManager.java:105)
dependency-track-dtrack-apiserver-1  |  at org.datanucleus.state.StateManagerImpl.replacingObjectField(StateManagerImpl.java:1995)
dependency-track-dtrack-apiserver-1  |  at org.dependencytrack.model.Component.dnReplaceField(Component.java)
dependency-track-dtrack-apiserver-1  |  at org.dependencytrack.model.Component.dnReplaceFields(Component.java)
dependency-track-dtrack-apiserver-1  |  at org.datanucleus.state.StateManagerImpl.replaceFields(StateManagerImpl.java:4320)
dependency-track-dtrack-apiserver-1  |  at org.datanucleus.state.StateManagerImpl.replaceFields(StateManagerImpl.java:4345)
dependency-track-dtrack-apiserver-1  |  at org.datanucleus.state.StateManagerImpl.detachCopy(StateManagerImpl.java:4941)
dependency-track-dtrack-apiserver-1  |  at org.datanucleus.state.StateManagerImpl.detachCopy(StateManagerImpl.java:126)
dependency-track-dtrack-apiserver-1  |  at org.datanucleus.ExecutionContextImpl.detachObjectCopy(ExecutionContextImpl.java:2741)
dependency-track-dtrack-apiserver-1  |  at org.datanucleus.api.jdo.JDOPersistenceManager.jdoDetachCopy(JDOPersistenceManager.java:1121)
dependency-track-dtrack-apiserver-1  |  at org.datanucleus.api.jdo.JDOPersistenceManager.detachCopy(JDOPersistenceManager.java:1150)
dependency-track-dtrack-apiserver-1  |  at org.dependencytrack.persistence.ComponentQueryManager.createComponent(ComponentQueryManager.java:321)
dependency-track-dtrack-apiserver-1  |  at org.dependencytrack.persistence.QueryManager.createComponent(QueryManager.java:452)
dependency-track-dtrack-apiserver-1  |  at org.dependencytrack.tasks.BomUploadProcessingTask.processComponent(BomUploadProcessingTask.java:171)
dependency-track-dtrack-apiserver-1  |  at org.dependencytrack.tasks.BomUploadProcessingTask.processComponent(BomUploadProcessingTask.java:178)
dependency-track-dtrack-apiserver-1  |  at org.dependencytrack.tasks.BomUploadProcessingTask.processComponent(BomUploadProcessingTask.java:178)
dependency-track-dtrack-apiserver-1  |  at org.dependencytrack.tasks.BomUploadProcessingTask.processComponent(BomUploadProcessingTask.java:178)
dependency-track-dtrack-apiserver-1  |  at org.dependencytrack.tasks.BomUploadProcessingTask.processComponent(BomUploadProcessingTask.java:178)
dependency-track-dtrack-apiserver-1  |  at org.dependencytrack.tasks.BomUploadProcessingTask.processComponent(BomUploadProcessingTask.java:178)
dependency-track-dtrack-apiserver-1  |  at org.dependencytrack.tasks.BomUploadProcessingTask.processComponent(BomUploadProcessingTask.java:178)
dependency-track-dtrack-apiserver-1  |  at org.dependencytrack.tasks.BomUploadProcessingTask.processComponent(BomUploadProcessingTask.java:178)
dependency-track-dtrack-apiserver-1  |  at org.dependencytrack.tasks.BomUploadProcessingTask.processComponent(BomUploadProcessingTask.java:178)
dependency-track-dtrack-apiserver-1  |  at org.dependencytrack.tasks.BomUploadProcessingTask.processComponent(BomUploadProcessingTask.java:178)
dependency-track-dtrack-apiserver-1  |  at org.dependencytrack.tasks.BomUploadProcessingTask.processComponent(BomUploadProcessingTask.java:178)
```

@JayAtFujifilm
Copy link
Author

The SBOM for which this occurs contains a deeply nested component hierarchy that mimics the layout of our source code. To build the SBOM, we iterate through our source code, building an individual SBOM for each project (.Net or NodeJS), and use the Cyclone CLI tool to merge the individual SBOMs. We also use "pseudo-components" to represent source code folders that don't directly contain a project.

@stevespringett
Copy link
Member

Thanks for the update. FYI, at this time, Dependency-Track will flatten the component inventory and will not preserve hierarchy. Support for parent/child relationships for both projects and components is planned.

@nscuro nscuro added defect Something isn't working and removed in triage labels Aug 24, 2022
@JayAtFujifilm
Copy link
Author

Thank you, syalioune, for reproducing this so quickly!

@syalioune
Copy link
Contributor

It would definitively help if you could provide your anonymized SBOM. My test SBOM is somewhat biased and extreme as I have duplicated the same component in the nested hierarchy probably causing an infinite recursion.
I tested again with the attached sbom (14 nested different components) bom-2.txt and everything run smoothly.
Nevertheless the algorithm could be improved to take that case into account.

@JayAtFujifilm
Copy link
Author

We are working on trying to narrow down the cause the problem, and hope to have a smaller SBOM available for debugging soon.

@sahil3112
Copy link

sahil3112 commented Mar 15, 2023

Hi @syalioune ,

Can you please mention what information is required in anonymized,

The same issue is still in latest version

Will the Structure of SBOM help to debug?

@syalioune
Copy link
Contributor

What is important for reproduction will be to have :

  • Same number of components
  • Same nested component hierarchy
  • Consistent : if you replace componentA by some string, it should be the same everywhere
    Given the error, the actual component names and identifiers (CPE, purls) are not relevant because it crashes before vuln analysis.

@sahil3112
Copy link

Here is the SBOM structure, I have verified that we can reproduce the issue by using the structure SBOM

sbom.txt

@dancundy
Copy link

dancundy commented Mar 30, 2023

Hi all,

We are also seeing this issue in our organisation. It sounds like this only happens on the second iteration. So would a feasible work around be to purge the database prior to each run?

What would be your recommendation?

Also happy to provide logs etc if that's going to help in anyway.

@dancundy
Copy link

@JayAtFujifilm Did you find a work around for this?
@stevespringett Could I politely ask if there is any update around this issue?

syalioune added a commit to syalioune/dependency-track that referenced this issue May 15, 2023
@syalioune
Copy link
Contributor

syalioune commented May 15, 2023

Looking at the different logs and sbom provided, the common pattern that emerge is a nested duplicate component like in the SBOM below

{
    "bomFormat": "CycloneDX",
    "specVersion": "1.4",
    "version": 1,
    "metadata": {
        "timestamp": "2023-01-01T11:01:51Z",
        "tools": [
            {
                "vendor": "changeme",
                "name": "changeme",
                "version": "0.62.3"
            }
        ]
    },
    "components": [
        {
            "bom-ref": "pkg:pypi/[email protected]",
            "type": "library",
            "name": "Pillow",
            "version": "9.3.0",
            "cpe": "cpe:2.3:a:alex_clark_\\(pil_fork_author\\):python-Pillow:9.3.0:*:*:*:*:*:*:*",
            "purl": "pkg:pypi/[email protected]",
            "components": [
                {
                    "bom-ref": "pkg:pypi/[email protected]?package-id=212c649613e17901",
                    "type": "library",
                    "name": "Pillow",
                    "version": "9.3.0",
                    "cpe": "cpe:2.3:a:alex_clark_\\(pil_fork_author\\):python-Pillow:9.3.0:*:*:*:*:*:*:*",
                    "purl": "pkg:pypi/[email protected]"
                }
            ]
        }
    ]
}

The scenario is :

  • On the first BOM upload, there are no existing component for the project. DT creates two distinct components in the database (dbComponentA & dbComponentB), links them (dbComponentA is parent of dbComponentB) and proceed further
  • On the second BOM upload, the components from the SBOM are matched to the first database component based on identity (i.e. purl, cpe, group,...) so the two components are matched to, for example, dbComponentA. DT links them : dbComponentA is parent of dbComponentA which causes the infinite recursion.

Kind of the same premises as in #2131 (comment)

The SBOM is flawed to begin with. @sahil3112 @dancundy @JayAtFujifilm can you please confirm that your non redacted SBOM match the pattern of the SBOM snippet above and which tool you used to generate it ?

However DT can self protect against it. Two possibilities :

  1. Detect the issue on first upload and fail the BOM upload
  2. Detect the issue and skip the nested duplicate

I've submited a draft PR with the second alternative.

@ghost
Copy link

ghost commented May 22, 2023

@syalioune
I'm facing the same issue, with a BOM generated by cyclonedx-gomod. I did some analysis with a local version of DT and minimised the 600KB BOM to just a few components still reproducing the issue (at the end of this post). A few things I noticed (more details below):

  • During the second import, a cyclic reference is created (child is identical to parent), and this is also persisted in the database.
  • After working around the issue in BomUploadProcessingTask, the infinite recursion will again occur in ModelConverter.flattenComponents (I did not check beyond this).
  • When retrieving the child component from the database via ComponentQueryManager.matchSingleIdentity, the parent component is returned instead, leading to the fact that the parent will refer to itself as a child.

Below are some more details:

So apparently in the input BOM there is parent-child relationship between two components that essentially refer to the same package (and which I would expect to be deduplicated on the second import of the BOM).

After the first import, this is what we see in the components overview in the DT UI:

Component Version Group Package URL (PURL)
cloud.google.com/go/storage v1.13.0 pkg:golang/cloud.google.com/go/[email protected]?type=module&goos=darwin&goarch=amd64
cloud.google.com/go/storage v1.13.0 pkg:golang/cloud.google.com/go/[email protected]?type=package

And this is what we see in the H2 console after the first import:

SELECT ID, NAME, PARENT_COMPONENT_ID, PURL, PURLCOORDINATES FROM COMPONENT

ID NAME PARENT_COMPONENT_ID PURL PURLCOORDINATES
936 cloud.google.com/go/storage null pkg:golang/cloud.google.com/go/[email protected]?type=module&goos=darwin&goarch=amd64 pkg:golang/cloud.google.com/go/[email protected]
937 cloud.google.com/go/storage 936 pkg:golang/cloud.google.com/go/[email protected]?type=package pkg:golang/cloud.google.com/go/[email protected]

So we see that the second component refers to the first one as its parent, which is consistent with the BOM.

During the second import, in ModelConverter.convert, the parent component is retrieved using its component identity:

Field Component Identity Value
ObjectType "COMPONENT"
purl "pkg:golang/cloud.google.com/go/[email protected]?type=module&goos=darwin&goarch=amd64"
purlCoordinates "pkg:golang/cloud.google.com/go/[email protected]"
name "cloud.google.com/go/storage"
version "v1.13.0"

The component returned is the one with ID=936, as expected.

Next, the child is being retrieved from the database by its component identity:

Field Component Identity Value
ObjectType "COMPONENT"
purl "pkg:golang/cloud.google.com/go/[email protected]?type=package"
purlCoordinates "pkg:golang/cloud.google.com/go/[email protected]"
name "cloud.google.com/go/storage"
version "v1.13.0"

In this case, also the component with ID=136 is returned. Consequently, the ModelConverter will assign this component as a child of the previous one, which is the same, hence creating the cyclic reference: component.setChildren(components);

Looking at the H2 console, we can see that the cyclic reference has been persisted after the second import. Moreover, the child component does not have a parent anymore, and the PURL property of the parent was set to the value of the child:

SELECT ID, NAME, PARENT_COMPONENT_ID, PURL, PURLCOORDINATES FROM COMPONENT

ID NAME PARENT_COMPONENT_ID PURL PURLCOORDINATES
936 cloud.google.com/go/storage 936 pkg:golang/cloud.google.com/go/[email protected]?type=package pkg:golang/cloud.google.com/go/[email protected]
937 cloud.google.com/go/storage null pkg:golang/cloud.google.com/go/[email protected]?type=package pkg:golang/cloud.google.com/go/[email protected]

I cannot judge whether this storage retrieval/persistence behaviour is expected or not, maybe it's supposed to be like that as part of the deduplication? Anyway I hope this helps in isolating and resolving the issue.

Below is the reduced BOM that has the issue:

{
    "$schema": "http://cyclonedx.org/schema/bom-1.4.schema.json",
    "bomFormat": "CycloneDX",
    "specVersion": "1.4",
    "version": 1,
    "metadata": {
        "timestamp": "2023-05-16T08:57:13+02:00",
        "tools": [
            {
                "vendor": "CycloneDX",
                "name": "cyclonedx-gomod",
                "version": "v1.4.0"  
            }  
        ],
        "component": {
            "bom-ref": "pkg:golang/go.foobar.com/[email protected]?type=module",
            "type": "application",
            "name": "go.foobar.com/localfull",
            "version": "v0.0.0-20230515095825-3c9a500d1e33",
            "purl": "pkg:golang/go.foobar.com/[email protected]?type=module\u0026goos=darwin\u0026goarch=amd64",
            "properties": [
            ],
            "components": [
            ]
          }
    },
    "components": [
          {
            "bom-ref": "pkg:golang/cloud.google.com/go/[email protected]?type=module",
            "type": "library",
            "name": "cloud.google.com/go/storage",
            "version": "v1.13.0",
            "scope": "required",
            "hashes": [
              {
                "alg": "SHA-256",
                "content": "6a63ef842388f8796da7aacfbbeeb661dc2122b8dffb7e0f29500be07c206309"
              }
            ],
            "purl": "pkg:golang/cloud.google.com/go/[email protected]?type=module\u0026goos=darwin\u0026goarch=amd64",
            "components": [
              {
                "type": "library",
                "name": "cloud.google.com/go/storage",
                "version": "v1.13.0",
                "purl": "pkg:golang/cloud.google.com/go/[email protected]?type=package"
              }
            ],
            "evidence": {
              "licenses": [
                {
                  "license": {
                    "id": "Apache-2.0"
                  }
                }
              ]
            }
          }   
    ],
    "dependencies": [
      {
        "ref": "pkg:golang/go.foobar.com/[email protected]?type=module",
        "dependsOn": [
          "pkg:golang/cloud.google.com/go/[email protected]?type=module"
        ]
      }
    ]
}

@syalioune
Copy link
Contributor

syalioune commented May 30, 2023

Hello @salfie

Thanks for your thorough investigation. It match with my observations and based on that, I can provide the attached real life reproductible example cyclonedx-gomod-issue-1905.zip.

Given the example application and SBOM generation with cyclonedx-gomod using

cyclonedx-gomod app -json -output acme-app.bom.json -licenses -packages .

We end up with the nested golang module/package components

{
      "bom-ref": "pkg:golang/cloud.google.com/go/[email protected]?type=module",
      "type": "library",
      "name": "cloud.google.com/go/storage",
      "version": "v1.30.1",
      "scope": "required",
      "hashes": [
        {
          "alg": "SHA-256",
          "content": "b8e74cc40b3c1c4c6a0659cbb674323f4624bdb883a5d1928462adc7a53fa0d3"
        }
      ],
      "purl": "pkg:golang/cloud.google.com/go/[email protected]?type=module\u0026goos=linux\u0026goarch=amd64",
      "components": [
        {
          "type": "library",
          "name": "cloud.google.com/go/storage",
          "version": "v1.30.1",
          "purl": "pkg:golang/cloud.google.com/go/[email protected]?type=package"
        }
      ]
}

The ComponentQueryManager.matchSingleIdentity match both the module and package components because of this predicate (purl != null && purl == :purl) || (purlCoordinates != null && purlCoordinates == :purlCoordinates) in the generated JDOQL here (purlCoordinates are equal whereas purl are different).

if both purl and purlCoordinates are not null, we should maybe just filter on purl which is more specific ?
@nscuro WDYT ?

@nscuro
Copy link
Member

nscuro commented Jul 5, 2023

@syalioune Apologies for the delayed response, I only now got some time to look at BOM processing more closely.

if both purl and purlCoordinates are not null, we should maybe just filter on purl which is more specific ?

Agreed. I'd even go one step further: If purl is null, then also query for purl == null.

There are multiple issues closely related to this one. It all comes down to matchSingleIdentity being too lax, see my comment here: #2519 (comment)

For a change in Hyades, I now switched the matching logic to being "strict", and that fixes both #1905 and #2519: DependencyTrack/hyades-apiserver@6418879#diff-3a9c95d09a4a5285037a7d5ba65613e09198ce2b460279622cadd8e703677d40

nscuro added a commit to DependencyTrack/hyades-apiserver that referenced this issue Jul 7, 2023
nscuro added a commit to DependencyTrack/hyades-apiserver that referenced this issue Jul 8, 2023
nscuro added a commit to DependencyTrack/hyades-apiserver that referenced this issue Jul 10, 2023
* Add bloated BOM for ingestion performance testing

Signed-off-by: nscuro <[email protected]>

* Prevent query compilation cache being bypassed for `matchSingleIdentity` queries

See DependencyTrack/dependency-track#2540

This also cleans the query from containing weird statements like `(cpe != null && cpe == null)` in case a component does not have a CPE.

Signed-off-by: nscuro <[email protected]>

* WIP: Improve BOM processing performance

Signed-off-by: nscuro <[email protected]>

* Handle dependency graph

Signed-off-by: nscuro <[email protected]>

* Improve dependency graph assembly

Instead of using individual bulk UPDATE queries, use setters on persistent components instead. This way we can again make use of batched flushing.

Signed-off-by: nscuro <[email protected]>

* Completely replace old processing logic

Also decompose large processing method into multiple smaller ones, and re-implement notifications.

Signed-off-by: nscuro <[email protected]>

* Fix not all BOM refs being updated with new component identities

Signed-off-by: nscuro <[email protected]>

* Be smarter about indexing component identities and BOM refs

Also add more documentation

Signed-off-by: nscuro <[email protected]>

* Reduce logging noise

Signed-off-by: nscuro <[email protected]>

* Mark new components as such

... via new transient field. Required for compatibility with #217

Signed-off-by: nscuro <[email protected]>

* Compatibility with #217

Signed-off-by: nscuro <[email protected]>

* Cleanup tests

Signed-off-by: nscuro <[email protected]>

* Reduce code duplication

Signed-off-by: nscuro <[email protected]>

* Cleanup; Process services

Signed-off-by: nscuro <[email protected]>

* Finishing touches 🪄

Signed-off-by: nscuro <[email protected]>

* Make flush threshold configurable

The optimal value could depend on how beefy the database server is, and how much memory is available to the API server.

Signed-off-by: nscuro <[email protected]>

* Clarify `warn` log when rolling back active transactions

Signed-off-by: nscuro <[email protected]>

* Log number of consumed components and services before and after de-dupe

Signed-off-by: nscuro <[email protected]>

* Extend BOM processing test with bloated BOM

Signed-off-by: nscuro <[email protected]>

* Make component identity matching strict

To address DependencyTrack/dependency-track#2519 (comment).

Also add regression test for this specific issue.

Signed-off-by: nscuro <[email protected]>

* Add regression test for DependencyTrack/dependency-track#1905

Signed-off-by: nscuro <[email protected]>

* Clarify why "reachability on commit" is disabled; Add assertion for persistent object state

Signed-off-by: nscuro <[email protected]>

* Add tests for `equals` and `hashCode` of `ComponentIdentity`

Signed-off-by: nscuro <[email protected]>

* Address review comments

Signed-off-by: nscuro <[email protected]>

---------

Signed-off-by: nscuro <[email protected]>
@syalioune
Copy link
Contributor

Apologies for the delayed response, I only now got some time to look at BOM processing more closely.

No pb. Same time issues here.

I guess the fix you performed in hyades would be merged back here sometime ?

@nscuro
Copy link
Member

nscuro commented Jul 17, 2023

I guess the fix you performed in hyades would be merged back here sometime ?

Yes, we have many improvements from Hyades in the pipeline that we want to contribute back soon, and this is of course one of them. 🤘

@melba-lopez
Copy link
Contributor

@nscuro has this been addressed per #218? and would this be a 4.10 potential fix?

mehab pushed a commit to DependencyTrack/hyades-apiserver that referenced this issue Sep 12, 2023
* Add bloated BOM for ingestion performance testing

Signed-off-by: nscuro <[email protected]>

* Prevent query compilation cache being bypassed for `matchSingleIdentity` queries

See DependencyTrack/dependency-track#2540

This also cleans the query from containing weird statements like `(cpe != null && cpe == null)` in case a component does not have a CPE.

Signed-off-by: nscuro <[email protected]>

* WIP: Improve BOM processing performance

Signed-off-by: nscuro <[email protected]>

* Handle dependency graph

Signed-off-by: nscuro <[email protected]>

* Improve dependency graph assembly

Instead of using individual bulk UPDATE queries, use setters on persistent components instead. This way we can again make use of batched flushing.

Signed-off-by: nscuro <[email protected]>

* Completely replace old processing logic

Also decompose large processing method into multiple smaller ones, and re-implement notifications.

Signed-off-by: nscuro <[email protected]>

* Fix not all BOM refs being updated with new component identities

Signed-off-by: nscuro <[email protected]>

* Be smarter about indexing component identities and BOM refs

Also add more documentation

Signed-off-by: nscuro <[email protected]>

* Reduce logging noise

Signed-off-by: nscuro <[email protected]>

* Mark new components as such

... via new transient field. Required for compatibility with #217

Signed-off-by: nscuro <[email protected]>

* Compatibility with #217

Signed-off-by: nscuro <[email protected]>

* Cleanup tests

Signed-off-by: nscuro <[email protected]>

* Reduce code duplication

Signed-off-by: nscuro <[email protected]>

* Cleanup; Process services

Signed-off-by: nscuro <[email protected]>

* Finishing touches 🪄

Signed-off-by: nscuro <[email protected]>

* Make flush threshold configurable

The optimal value could depend on how beefy the database server is, and how much memory is available to the API server.

Signed-off-by: nscuro <[email protected]>

* Clarify `warn` log when rolling back active transactions

Signed-off-by: nscuro <[email protected]>

* Log number of consumed components and services before and after de-dupe

Signed-off-by: nscuro <[email protected]>

* Extend BOM processing test with bloated BOM

Signed-off-by: nscuro <[email protected]>

* Make component identity matching strict

To address DependencyTrack/dependency-track#2519 (comment).

Also add regression test for this specific issue.

Signed-off-by: nscuro <[email protected]>

* Add regression test for DependencyTrack/dependency-track#1905

Signed-off-by: nscuro <[email protected]>

* Clarify why "reachability on commit" is disabled; Add assertion for persistent object state

Signed-off-by: nscuro <[email protected]>

* Add tests for `equals` and `hashCode` of `ComponentIdentity`

Signed-off-by: nscuro <[email protected]>

* Address review comments

Signed-off-by: nscuro <[email protected]>

---------

Signed-off-by: nscuro <[email protected]>
Signed-off-by: mehab <[email protected]>
@nscuro nscuro added this to the 4.11 milestone Jan 7, 2024
Copy link
Contributor

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
defect Something isn't working pending release
Projects
None yet
7 participants