Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specifying of OpenShift stream for install, closes #501 #508

Merged
merged 3 commits into from
Jul 12, 2024

Conversation

smalleni
Copy link
Collaborator

@smalleni smalleni commented Jul 6, 2024

This removes the need to explicitly define the OCP release image and instead requires ocp_version and ocp_build from the user. OpenShift version could be specific like 4.15.4 or more generic like latest-4.15. The build should denote whether the required OCP version is ga or under development using the keys ga or dev. This commit also removes the circular dependency on having oc (installed as oc.latest to extract the ocp client/installer binaries). Overall, this greatly simplifies the use of jetlag while at the same time making it intelligent to deploy latest ga/nightly versions across minor releases.

Co-authored-by: Alex Krzos [email protected]
Closes: #501

@smalleni smalleni force-pushed the main branch 3 times, most recently from 41f0b0f to f69724a Compare July 6, 2024 11:23
@smalleni smalleni self-assigned this Jul 6, 2024
@smalleni
Copy link
Collaborator Author

smalleni commented Jul 7, 2024

FYI, output from a successful run with this patch

TASK [bm-post-cluster-install : Install Performance-Addon-Operator - create PAO namespace, OperatorGroup and subscription] ***
skipping: [f29-h02-000-r640.rdu2.scalelab.redhat.com]                                                  

PLAY RECAP *********************************************************************************************
f29-h02-000-r640.rdu2.scalelab.redhat.com : ok=1071 changed=52   unreachable=0    failed=0    skipped=1493 rescued=132  ignored=0

(.ansible) [root@f29-h02-000-r640 ansible]# oc get nodes
NAME               STATUS   ROLES                  AGE   VERSION
f30-h03-000-r640   Ready    control-plane,master   12m   v1.29.5+58452d8
f30-h05-000-r640   Ready    control-plane,master   42m   v1.29.5+58452d8
f30-h06-000-r640   Ready    control-plane,master   44m   v1.29.5+58452d8
f30-h07-000-r640   Ready    worker                 21m   v1.29.5+58452d8
f30-h09-000-r640   Ready    worker                 21m   v1.29.5+58452d8
f30-h10-000-r640   Ready    worker                 21m   v1.29.5+58452d8
f30-h11-000-r640   Ready    worker                 21m   v1.29.5+58452d8
f30-h13-000-r640   Ready    worker                 21m   v1.29.5+58452d8
f30-h14-000-r640   Ready    worker                 21m   v1.29.5+58452d8
f30-h15-000-r640   Ready    worker                 21m   v1.29.5+58452d8
f31-h01-000-r640   Ready    worker                 21m   v1.29.5+58452d8
f31-h02-000-r640   Ready    worker                 21m   v1.29.5+58452d8
f31-h03-000-r640   Ready    worker                 21m   v1.29.5+58452d8
f31-h06-000-r640   Ready    worker                 21m   v1.29.5+58452d8
f31-h07-000-r640   Ready    worker                 21m   v1.29.5+58452d8

Copy link
Collaborator

@dbutenhof dbutenhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably need to look this over in more detail, but I have a bunch of comments on my first pass.

Generally, though, this looks great, and thanks for taking it on.

docs/tips-and-vars.md Outdated Show resolved Hide resolved
docs/bastion-deploy-bm-byol.md Outdated Show resolved Hide resolved
ansible/vars/sync-ocp-release.sample.yml Show resolved Hide resolved
ansible/roles/validate-vars/tasks/main.yml Outdated Show resolved Hide resolved
@smalleni
Copy link
Collaborator Author

smalleni commented Jul 9, 2024

I believe I have addressed all the comments, please let me know if something slipped. Thanks!

@smalleni smalleni requested review from akrzos and dbutenhof July 9, 2024 01:58
@smalleni smalleni force-pushed the main branch 2 times, most recently from 85c550c to 39c01b6 Compare July 9, 2024 05:54
@josecastillolema
Copy link
Collaborator

Tested in the CPT lab also:

[root@m42-h01-000-r760 ~]# oc version
Client Version: 4.15.20
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: 4.15.20
Kubernetes Version: v1.28.10+a2c84a5

[root@m42-h01-000-r760 ~]# oc get node
NAME               STATUS   ROLES                  AGE   VERSION
m42-h03-000-r760   Ready    control-plane,master   8h    v1.28.10+a2c84a5
m42-h05-000-r760   Ready    control-plane,master   9h    v1.28.10+a2c84a5
m42-h07-000-r760   Ready    control-plane,master   9h    v1.28.10+a2c84a5
m42-h09-000-r760   Ready    worker                 9h    v1.28.10+a2c84a5
m42-h11-000-r760   Ready    worker                 9h    v1.28.10+a2c84a5
m42-h13-000-r760   Ready    worker                 9h    v1.28.10+a2c84a5
m42-h15-000-r760   Ready    worker                 9h    v1.28.10+a2c84a5
m42-h17-000-r760   Ready    worker                 9h    v1.28.10+a2c84a5
m42-h19-000-r760   Ready    worker                 9h    v1.28.10+a2c84a5

Copy link
Collaborator

@dbutenhof dbutenhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea how common sync-ocp-release is, but I'd feel bad about bringing in a change to break it, and I think it might be easy to fix.

Aside from that and a few minor comments, this is looking really good.

Oh; and making changes during a review by amending the base commit makes it a lot harder to review ... this is GitHub, not Gerrit! I'd prefer to have a series of commits in a PR to allow incremental reviews, with a "squash-and-merge" at the end.

ansible/roles/validate-vars/tasks/main.yml Outdated Show resolved Hide resolved
ansible/roles/validate-vars/tasks/main.yml Show resolved Hide resolved
ansible/roles/validate-vars/tasks/main.yml Outdated Show resolved Hide resolved
This removes the need to explicitly define the OCP release image
and instead requires `ocp_version` and `ocp_build` from the user.
OpenShift version could be specific like 4.15.4 or more generic like
latest-4.15. The build should denote whether the required OCP version
is ga or under development using the keys `ga` or `dev`. This commit
also removes the circular dependency on having oc (installed as `oc.latest`
to extract the ocp client/installer binaries). Overall, this greatly simplifies
the use of jetlag while at the same time making it intelligent to deploy
latest ga/nightly versions across minor releases.

Closes: redhat-performance#501

Signed-off-by: Sai Sindhur Malleni <[email protected]>
@akrzos
Copy link
Member

akrzos commented Jul 9, 2024

Ironically, I think this PR is actually named the opposite of its new feature. (We currently require you to specify a direct or fixed image) I think it should be renamed from Provide ability to specify OpenShift version directly to something along the lines of Allow specifying of OpenShift stream for install. @smalleni What do you think?

@dbutenhof
Copy link
Collaborator

Ironically, I think this PR is actually named the opposite of its new feature. (We currently require you to specify a direct or fixed image) I think it should be renamed from Provide ability to specify OpenShift version directly to something along the lines of Allow specifying of OpenShift stream for install. @smalleni What do you think?

An interesting point, but really before you specified both the version and the download URL, whereas now you specify only the version (+ build stream), and it supports a more generalized form of build tag. So perhaps neither description really entirely captures the change.

Maybe "Construct OpenShift download URL based on version and build stream"???

Copy link
Member

@akrzos akrzos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add more feedback as I review this further today.

ansible/roles/bastion-install/tasks/main.yml Show resolved Hide resolved
ansible/vars/all.sample.yml Outdated Show resolved Hide resolved
@josecastillolema
Copy link
Collaborator

After the new PR updates everything looking good in the CPT lab env:

[root@m42-h01-000-r760 ~]# oc get node
NAME               STATUS   ROLES                  AGE   VERSION
m42-h03-000-r760   Ready    control-plane,master   60m   v1.28.10+a2c84a5
m42-h05-000-r760   Ready    control-plane,master   58m   v1.28.10+a2c84a5
m42-h07-000-r760   Ready    control-plane,master   33m   v1.28.10+a2c84a5
m42-h09-000-r760   Ready    worker                 36m   v1.28.10+a2c84a5
m42-h11-000-r760   Ready    worker                 36m   v1.28.10+a2c84a5
m42-h13-000-r760   Ready    worker                 36m   v1.28.10+a2c84a5
m42-h15-000-r760   Ready    worker                 36m   v1.28.10+a2c84a5
m42-h17-000-r760   Ready    worker                 36m   v1.28.10+a2c84a5
m42-h19-000-r760   Ready    worker                 37m   v1.28.10+a2c84a5
[root@m42-h01-000-r760 ~]# oc version
Client Version: 4.15.20
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: 4.15.20
Kubernetes Version: v1.28.10+a2c84a5

@@ -33,7 +48,7 @@
when: (cluster_type == "bm" or cluster_type == "rwn") and
(worker_node_count is undefined or not worker_node_count | int) and
ansible_play_name != "Create inventory from a lab cloud" and
ansible_play_name != "Create inventory from ibmcloud hardware"
ansible_play_name != "Create inventory from ibmcloud hardware"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a real comment here, but I wonder what this whitespace delete then add is about, maybe a github bug?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skipped over that since it was whitespace. But I actually played with copying the indent from the raw GitHub file into vscode, and the change is that he's accidentally replaced the final space with a tab. It doesn't affect the visual indent, but it should be reverted!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no clue what's happening here..just used tabs all along in vs code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't use tab indentation, if only because the visual presentation depends on tab settings of the output device, which is unreliable. In vscode, if you select the whitespace you should see something like ".....->" showing the tab character ... if you delete that character and align the line with spaces (I suspect just one), this minor annoying diff should go away.

Copy link
Member

@akrzos akrzos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please seek the lengthy comment in the tasks file under validate-vars role to see feedback on how we can finalize this and merge. Thanks @smalleni

ansible/roles/validate-vars/tasks/main.yml Outdated Show resolved Hide resolved
ansible/roles/validate-vars/tasks/main.yml Outdated Show resolved Hide resolved
ansible/vars/sync-ocp-release.sample.yml Show resolved Hide resolved
ansible/roles/bastion-install/tasks/main.yml Show resolved Hide resolved
docs/bastion-deploy-bm-byol.md Outdated Show resolved Hide resolved
@akrzos akrzos changed the title Provide ability to specify OpenShift version directly Allow specifying of OpenShift stream for install, closes #501 Jul 10, 2024
@smalleni smalleni force-pushed the main branch 4 times, most recently from 3b799f2 to 5f0ea23 Compare July 12, 2024 06:47
Co-authored-by: Alex Krzos <[email protected]>

Signed-off-by: Sai Sindhur Malleni <[email protected]>
Copy link
Collaborator

@dbutenhof dbutenhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from that annoying tab character (which I don't think will break anything, but shouldn't be there), this is looking good.

@@ -33,7 +48,7 @@
when: (cluster_type == "bm" or cluster_type == "rwn") and
(worker_node_count is undefined or not worker_node_count | int) and
ansible_play_name != "Create inventory from a lab cloud" and
ansible_play_name != "Create inventory from ibmcloud hardware"
ansible_play_name != "Create inventory from ibmcloud hardware"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't use tab indentation, if only because the visual presentation depends on tab settings of the output device, which is unreliable. In vscode, if you select the whitespace you should see something like ".....->" showing the tab character ... if you delete that character and align the line with spaces (I suspect just one), this minor annoying diff should go away.

Remove extra ocp_build var
Copy link
Member

@akrzos akrzos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks @smalleni

@akrzos akrzos merged commit f972783 into redhat-performance:main Jul 12, 2024
@dbutenhof
Copy link
Collaborator

@akrzos @smalleni Well, this is awkward. After rebasing my PR, which is mostly targeted at SNO creation, I tried it and got a failure here in validate-vars:

- name: Check if PAO install is enabled when du_profile is enabled and OCP version is 4.9 or 4.10
  fail:
    msg: "set install_performance_addon_operator to true"
  when: (cluster_type == "sno") and (openshift_version == "4.9" or openshift_version == "4.10") and (not install_performance_addon_operator | default(false) | bool) and (du_profile | default(false) | bool)

The openshift_version variable is no longer defined when we run validate-vars. Apparently Ansible short-circuits compound conditionals, so on BM builds the failure of cluster_type == "sno" prevents us from noticing ... but on an sno build, it fails because openshift_version isn't defined.

The problem is that fixing this under the new model is awkward as the openshift version is determined by a subsequent new role; and we can't check ocp_version instead since it may be something like latest-4.15.

We could move this error check later during the SNO post-install configuration (which would be ugly since we wouldn't get the error on startup), or ... well, I'm not coming up with any other great workaround. The simplest hack would be to just say we don't care about 4.9 and 4.10 anymore and delete the check entirely ... 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFE] Automatic ocp_release_image setup
4 participants