-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POC: block operator upgrade when detecting outdated vms #3171
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
74747cc
to
d4ec71b
Compare
Pull Request Test Coverage Report for Build 11952461865Details
💛 - Coveralls |
d4ec71b
to
ed2b6ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's a POC, but still have several comments inline.
@@ -1330,6 +1340,66 @@ func (r *ReconcileHyperConverged) deleteObj(req *common.HcoRequest, obj client.O | |||
return removed, nil | |||
} | |||
|
|||
func (r *ReconcileHyperConverged) EvaluateUpgradeEligibility(req *common.HcoRequest) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is called only locally and should not be exposed.
Please rename:
func (r *ReconcileHyperConverged) EvaluateUpgradeEligibility(req *common.HcoRequest) error { | |
func (r *ReconcileHyperConverged) evaluateUpgradeEligibility(req *common.HcoRequest) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
podList := &corev1.PodList{} | ||
listOpts := []client.ListOption{ | ||
client.InNamespace(req.Namespace), | ||
client.MatchingLabels{"kubevirt.io": "virt-controller"}, | ||
} | ||
|
||
if err := r.client.List(req.Ctx, podList, listOpts...); err != nil { | ||
req.Logger.Info("Failed to list virt-controller pods", "namespace", req.Namespace, "error", err) | ||
return fmt.Errorf("failed to list virt-controller pods: %w", err) | ||
} | ||
|
||
if len(podList.Items) == 0 { | ||
req.Logger.Info("No virt-controller pods found", "namespace", req.Namespace) | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: does each pod collects only several VM info? In another words: can we query the service instead of the pod itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't have a service for virt-controller as far as i know, only for virt-api
but i assume i can query just one of the pods since they all should have the same metrics data
for _, pod := range podList.Items { | ||
if pod.Status.PodIP == "" { | ||
continue | ||
} | ||
metricsURL := fmt.Sprintf("https://%s:%d/metrics", pod.Status.PodIP, 8443) | ||
|
||
resp, err := httpClient.Get(metricsURL) | ||
if err != nil { | ||
req.Logger.Info("Failed to query metrics from pod", "pod", pod.Name, "error", err) | ||
continue | ||
} | ||
|
||
if resp.StatusCode != http.StatusOK { | ||
req.Logger.Info("Metrics endpoint returned non-200 status", "pod", pod.Name, "status", resp.StatusCode) | ||
resp.Body.Close() | ||
continue | ||
} | ||
|
||
body, err := io.ReadAll(resp.Body) | ||
resp.Body.Close() | ||
if err != nil { | ||
req.Logger.Info("Failed to read metrics response from pod", "pod", pod.Name, "error", err) | ||
continue | ||
} | ||
|
||
virtMetricsData := string(body) | ||
rhel8Regex := regexp.MustCompile(`.*rhel8.*`) | ||
if rhel8Regex.MatchString(virtMetricsData) { | ||
req.Logger.Info("Detected outdated machine type in metrics", "pod", pod.Name, "matched", rhel8Regex.FindString(virtMetricsData)) | ||
req.Upgradeable = false | ||
return nil | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, we'll need to do it a-synchronized
} | ||
|
||
virtMetricsData := string(body) | ||
rhel8Regex := regexp.MustCompile(`.*rhel8.*`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never create regex within a loop. creating a regex variable is an expensive operation. The value of this regex will never changed during runtime. Please take it out of this method as a regular variable, that will be initialized only once.
just do:
var rhel8Regex = regexp.MustCompile(`.*rhel8.*`)
just above the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the regex will be created each reconcile loop then, that's ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it will be defined in the package scope, out of the method, then no. It will only be initialized once, on boot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's a POC, but still have several comments inline.
} | ||
|
||
if resp.StatusCode != http.StatusOK { | ||
req.Logger.Info("Metrics endpoint returned non-200 status", "pod", pod.Name, "status", resp.StatusCode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the term "non-200" in an error message.
WDYT about "Metrics endpoint returned error" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
|
||
if resp.StatusCode != http.StatusOK { | ||
req.Logger.Info("Metrics endpoint returned non-200 status", "pod", pod.Name, "status", resp.StatusCode) | ||
resp.Body.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use defer instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i did use that at first, but got a warning said that it's not safe to use defer in a loop, and it might cause a resources leek
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. so we can move the loop content into a new method. That will work.
} | ||
|
||
body, err := io.ReadAll(resp.Body) | ||
resp.Body.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use defer instead
continue | ||
} | ||
|
||
body, err := io.ReadAll(resp.Body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to use a scanner. The body can be big.
You can do something like
scanner := bufio.NewScanner(resp.Body)
...
for scanner.Scan() {
if rhel8Regex.MatchString(scanner.Text()) {
req.Logger.Info("Detected outdated machine type in metrics", "pod", pod.Name, "matched", rhel8Regex.FindString(virtMetricsData))
req.Upgradeable = false
return nil
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
ed2b6ed
to
a15db2d
Compare
Another comment I forgot to write: The current HTTP client code does not use any type of timeout. Use a context (derived from req.Context) with 3-seconds timeout. Sending HTTP request w/o timeout may be risky. |
a15db2d
to
eefac66
Compare
Signed-off-by: Daniel Sionov <[email protected]>
eefac66
to
dd7f177
Compare
Quality Gate passedIssues Measures |
This PR introduces the logic necessary for blocking hco-operator upgrade when we detect that there are outdated vms in the cluster
step 1: list all virt-controller pods
step 2: loop over the pods and perform http request to the /metrics endpoint to extract all the metrics data
step 3: use regex to make sure no one is using rhel8 machine types
step 4: set upgradeable false when we do detect rhel8 machines
the logic right now is simple and straight forward, and it can be further improved
Release note: