-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix VirtualMachineCRCErrors not stop firing #742
Fix VirtualMachineCRCErrors not stop firing #742
Conversation
Skipping CI for Draft Pull Request. |
/test all |
@machadovilaca: No presubmit jobs available for kubevirt/ssp-operator@main In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
0256308
to
ea42bff
Compare
/cc sradco |
ea42bff
to
58ad69e
Compare
/cc sradco |
/test e2e-functests |
@machadovilaca: No presubmit jobs available for kubevirt/ssp-operator@main In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR changes quite a lot. Can you explain a little how the fix works and add the explanation also to the PR and commit descriptions?
58ad69e
to
725d64c
Compare
/cc sradco |
725d64c
to
2a7b2f4
Compare
/cc sradco |
@0xFelix The problem was that the controller kept reporting the metric for old VMs even after they were deleted. In the controller, we can understand when a VM is deleted, but at that moment we don't have access to the details present in the old labels (pv name, pv type, etc...). This PR changes the metric so that it only uses the VM name and namespace as labels, so that we can update its value, when a VM is deleted, to a no-error value. Alert expression is updated accordingly. |
/retest |
10b0b31
to
47709c4
Compare
/cc sradco |
47709c4
to
f74b983
Compare
/cc sradco |
f74b983
to
e18adc2
Compare
/cc sradco |
@@ -58,7 +58,11 @@ func (r *VmReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Re | |||
vm := kubevirtv1.VirtualMachine{} | |||
if err := r.client.Get(ctx, req.NamespacedName, &vm); err != nil { | |||
if errors.IsNotFound(err) { | |||
// VM was deleted, so we can ignore it | |||
// VM was deleted | |||
vm.Name = req.Name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the VM was deleted, why do we add it to the metrics? Also it looks error-prone to call metrics.SetVmWithVolume
with nil-pointers, as IIUC this func does not check for nil pointers sufficiently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
the issue was that the controller kept exposing the metric that the VM, although already deleted, as the problematic volume configuration
this way when a VM is deleted, the controller no longer has a true value for the metric -
I think the nil check in the function covers this case completely:
if pv == nil || pvc == nil {
vmRbdVolume.WithLabelValues(vm.Name, vm.Namespace).Set(0)
return
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func SetVmWithVolume(vm *kubevirtv1.VirtualMachine, pvc *k8sv1.PersistentVolumeClaim, pv *k8sv1.PersistentVolume) { |
The function that is called looks different, than what you quoted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also changed that function on this PR:
https://github.com/kubevirt/ssp-operator/pull/742/files#diff-387c26f005e40fae497f3a5c01da1f877fc4eee90da11b7d8cf47512d8474b82R29-R31
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, my mistake.
The controller keeps reporting old metric values even after the VM is deleted. This commit updates the metric name and labels so that we can set up the metric value to 0, and no longer trigger the alert on VM deletion Signed-off-by: João Vilaça <[email protected]>
Signed-off-by: João Vilaça <[email protected]>
e18adc2
to
7af6434
Compare
/cc sradco |
Quality Gate failedFailed conditions 3.4% Duplication on New Code (required ≤ 3%) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
@@ -58,7 +58,11 @@ func (r *VmReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Re | |||
vm := kubevirtv1.VirtualMachine{} | |||
if err := r.client.Get(ctx, req.NamespacedName, &vm); err != nil { | |||
if errors.IsNotFound(err) { | |||
// VM was deleted, so we can ignore it | |||
// VM was deleted | |||
vm.Name = req.Name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, my mistake.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: 0xFelix The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
/retest |
If you want this to be in 4.15 you need to cherry-pick it to release-v0.19. |
/cherry-pick release-v0.19 |
@machadovilaca: new pull request created: #825 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-v0.18 |
@machadovilaca: new pull request created: #906 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What this PR does / why we need it:
The controller keeps reporting old metric values even after the VM is deleted. This PR updates the metric name and labels so that we can set up the metric value to 0, and no longer trigger the alert on VM deletion
Reduces VirtualMachineCRCErrors operator impact to none
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Release note: