Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DPU object is in ERROR phase when there is "pre bfb-install" failure, no auto re-provisioning is attempted on timeout #40

Open
wpeng102 opened this issue Dec 25, 2024 · 0 comments

Comments

@wpeng102
Copy link
Collaborator

DPU object is in ERROR phase when there is "pre bfb-install" failure, no auto re-provisioning is attempted on timeout

Provisioning is stuck on ERROR phase when there is "pre bfb-install" failure, no auto re-provisioning is attempted on timeout. only way to trigger re-provision is to delete the DPU object.

DMS timeout only take effect on "active" API (bfb-install).

In our case the failures happened on inactive API (pre bfb-install) due to:

  • old "dpf-operator-system-client-secret" secret is not removed when new DPU object come up

  • NFS was not available to the DMS pod when it came up

Mitigating the Issue:

Deleting the DPU object will triger re-provisioning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant