Check status of all the core pods for microshift #4009

praveenkumar · 2024-02-01T07:58:34Z

In past we observed having kube api access doesn't mean all the required service pods are running and cluster is working as expected. This PR adds a list of core namespace for microshift preset and make sure all the pods in that namespace is running before letting user to know to consume the cluster.

Fixes: Issue #3852

openshift-ci · 2024-02-01T07:58:46Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from praveenkumar. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pkg/crc/cluster/cluster.go

cfergeau · 2024-02-02T11:49:24Z

pkg/crc/cluster/cluster.go

+}
+
+func podRunningForNamespace(ocConfig oc.Config, namespace string) bool {
+	stdin, stderr, err := ocConfig.WithFailFast().RunOcCommand("get", "pods", "-n", namespace, "--field-selector=status.phase!=Running")


Why !=Running? I would have expected =Running?

@cfergeau If I use ==Running that means it will show all the running pods, what we want the pods which are not in Running phase in that specific namespace so we can reiterate in retry function.

$ kubectl get pod -n kube-system --field-selector=status.phase!=Running NAME READY STATUS RESTARTS AGE csi-snapshot-controller-85cc4fd76b-xznzw 1/1 Pending 0 45h

pkg/crc/cluster/cluster.go

cfergeau · 2024-02-02T11:51:53Z

pkg/crc/cluster/cluster.go

+			if !podRunningForNamespace(ocConfig, namespace) {
+				logging.Debugf("Pods in %s namespace are not running", namespace)
+				return &errors.RetriableError{Err: fmt.Errorf("pods in %s namespace are not running", namespace)}
+			}


Fwiw, this is a bit wasteful as we'll try again and again the same namespaces even if we already found running pods. Maybe this can be done with a map? map keys are namespaces, iterate over the keys. When there are running pods in the namespace, remove it from the map?

Yes, but it has bit of benefit in case some pod goes to reconciliation state (like in one iteration it is running but in second it is in pending state.) It is not full proof solution (with k8s context it is never going to be) but should be good for initial feedback if core pods are running.

In the OpenShift case, once the oc get co check succeeds once, we retry 2 more times and we only decide the cluster is good when the oc get co check succeeds 3 times in a row. If you want to handle " in one iteration it is running but in second it is in pending state" it would be nice to have a consistent approach.

In the OpenShift case, once the oc get co check succeeds once, we retry 2 more times and we only decide the cluster is good when the oc get co check succeeds 3 times in a row. If you want to handle " in one iteration it is running but in second it is in pending state" it would be nice to have a consistent approach.

yes in case of openshift we can iterate over all the clusteroperator at once because those are not namespace specific resource. Here we are not able to have a single call which provide use all the pods status in core namespaces otherwise I would've use same logic. So now we iterate over namespace by namespace and check the pods status.

Specify multiple namespaces while querying the cluster kubernetes/kubernetes#52326

I'm not questioning the way the iterations are done, I was reacting to

it has bit of benefit in case some pod goes to reconciliation state (like in one iteration it is running but in second it is in pending state.)

For an OpenShift cluster, we roughly do iterate over a isClusterReady() function until it returns true. Once it returns true, we still run it 2 times in case the cluster was ready, but in a transient/conciliation state.
If reconciliation is something you want to try to handle better, I would use the same approach as for OpenShift for consistency, cluster is not ready before isClusterReady() succeeded 3 times in a row.

pkg/crc/cluster/cluster.go

In past we observed having kube api access doesn't mean all the required service pods are running and cluster is working as expected. This PR adds a list of core namespace for microshift preset and make sure all the pods in that namespace is running before letting user to know to consume the cluster.

cfergeau · 2024-02-06T09:16:14Z

pkg/crc/cluster/cluster.go

+	return errors.Retry(ctx, 2*time.Minute, waitForPods, 2*time.Second)
+}
+
+func podRunningForNamespace(ocConfig oc.Config, namespace string) bool {


allPodsRunning(ocConfig oc.Config, namespace string) bool or checkAllPodsRunning is more descriptive/accurate.

it should be in namespace context so checkAllPodsRunningInNamespace or allPodsRunningForNamespace ?

There is a namespace argument, we don't have non-namespace function this could be confused with, so I don't think it's really useful to mention Namespace in the function name. It's more something for an api doc comment if you think it's important to inform API users that it will only iterate over a single namespace.

cfergeau · 2024-02-06T09:17:49Z

pkg/crc/cluster/cluster.go

+			if !podRunningForNamespace(ocConfig, namespace) {
+				logging.Debugf("Pods in %s namespace are not running", namespace)
+				return &errors.RetriableError{Err: fmt.Errorf("pods in %s namespace are not running", namespace)}
+			}


In the OpenShift case, once the oc get co check succeeds once, we retry 2 more times and we only decide the cluster is good when the oc get co check succeeds 3 times in a row. If you want to handle " in one iteration it is running but in second it is in pending state" it would be nice to have a consistent approach.

gbraad · 2024-04-10T07:30:37Z

This PR adds a list of core namespace for microshift preset

This is not microshift specific, as it might also be able to solve issues with the readiness of the OCP and OKD preset.

gbraad · 2024-04-10T07:34:47Z

has-to-be-in-release

What was the reasoning behind this label, as the fixed issue: #3852 is merely an enhancement.

praveenkumar · 2024-04-25T08:22:18Z

/hold

openshift-ci bot requested review from cfergeau and gbraad February 1, 2024 07:58

praveenkumar force-pushed the issue_3852 branch from 92394d6 to 5907b3b Compare February 1, 2024 10:17

cfergeau reviewed Feb 2, 2024

View reviewed changes

praveenkumar force-pushed the issue_3852 branch from 5907b3b to 217216c Compare February 6, 2024 06:34

cfergeau reviewed Feb 6, 2024

View reviewed changes

praveenkumar added the has-to-be-in-release This PR need to go in coming release. label Feb 12, 2024

praveenkumar removed the has-to-be-in-release This PR need to go in coming release. label Feb 21, 2024

adrianriobo mentioned this pull request Apr 10, 2024

[enhancement] check dns pod state as part of crc status operation #3852

Open

openshift-ci bot added the do-not-merge/hold label Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check status of all the core pods for microshift #4009

Check status of all the core pods for microshift #4009

praveenkumar commented Feb 1, 2024

openshift-ci bot commented Feb 1, 2024

cfergeau Feb 2, 2024

praveenkumar Feb 6, 2024

cfergeau Feb 2, 2024

praveenkumar Feb 6, 2024

cfergeau Feb 6, 2024

praveenkumar Feb 6, 2024

cfergeau Feb 14, 2024

cfergeau Feb 6, 2024

praveenkumar Feb 6, 2024

cfergeau Feb 14, 2024

cfergeau Feb 6, 2024

gbraad commented Apr 10, 2024

gbraad commented Apr 10, 2024

praveenkumar commented Apr 25, 2024

Check status of all the core pods for microshift #4009

Are you sure you want to change the base?

Check status of all the core pods for microshift #4009

Conversation

praveenkumar commented Feb 1, 2024

openshift-ci bot commented Feb 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gbraad commented Apr 10, 2024

gbraad commented Apr 10, 2024

praveenkumar commented Apr 25, 2024