Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BFB provisioning stuck in Pending phase without clear indication of a download failure. #29

Open
Nadav-Rub opened this issue Dec 24, 2024 · 0 comments

Comments

@Nadav-Rub
Copy link
Collaborator

When deploying the BFB resource (bf-bundle), the status remains in the Pending phase across all worker nodes without providing a clear indication of a failure related to downloading the BFB file. This leads to confusion and delays in debugging as there are no events or logs suggesting a root cause for the provisioning issue.

Steps to Reproduce

Deploy a BFB resource with a wrong URL for BFB
Observe the status of the BFB resource and worker nodes
Expected Behavior:
The provisioning process should:

Fail with a clear error message and appropriate events (e.g., "Failed to download BFB from specified URL").

Actual Behavior

  1. The BFB resource remains in the Initializing phase, and the worker nodes remain in Pending state indefinitely.
depuser@dpf3-jump:~/dpf-qa-ocp-integration-setup/doca-platform-foundation/docs/guides/usecases/hbn_ovn$ k describe bfbs.provisioning.dpu.nvidia.com -AName:         bf-bundleNamespace:    dpf-operator-systemLabels:       <none>Annotations:  <none>API Version:  provisioning.dpu.nvidia.com/v1alpha1Kind:         BFBMetadata:  Creation Timestamp:  2024-12-09T15:28:46Z  Generation:          1  Resource Version:    7477  UID:                 50c8a306-89ca-4028-a2ae-0297da696cb0Spec:  URL:  http://nbu-nfs.mellanox.com/auto/sw_mc_soc_release/doca_dpu/doca_2.9.1/20241203.1/bfbs/dk/bf-bundle-2.9.1-20_24.11_ubuntu-22.04_prod.bfbStatus:  Phase:  InitializingEvents:   <none> 
  1. No events are generated to indicate a download failure or any related issue.
  2. Logs lack clarity on the root cause, making troubleshooting difficult.

Ref: #4203694

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant