Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CentOS Stream CoreOS 417 Boot Fails #2077

Open
necouchman opened this issue Dec 26, 2024 · 7 comments
Open

CentOS Stream CoreOS 417 Boot Fails #2077

necouchman opened this issue Dec 26, 2024 · 7 comments

Comments

@necouchman
Copy link

necouchman commented Dec 26, 2024

Describe the bug
I'm attempting to install OKD 4.17 in the user-provisioned manner. I've downloaded the recommended FCOS (39.20231101.3.0) and am booting into the ISO of that, and then running the coreos-installer with the ignition files generated by the installer. When I run the coreos-installer with the bootstrap.ign file, the system reboots and then I get a couple of interesting items:

  • The grub loader says "CentOS Stream CoreOS 417...", which is different from the FCOS that I installed.
  • If I try to boot into the CentOS Stream CoreOS 417 option, I get an error: bad shim signature.
  • If I boot into the second option, "Fedora CoreOS 39...", the system boots, but the bootstrap services don't actually start correctly.

image

Dec 26 04:58:44 bootstrap.domain.local bootkube.sh[3221]: /usr/local/bin/bootkube.sh: line 85: oc: command not found
Dec 26 04:58:44 bootstrap.domain.local systemd[1]: bootkube.service: Main process exited, code=exited, status=127/n/a
Dec 26 04:58:44 bootstrap.domain.local systemd[1]: bootkube.service: Failed with result 'exit-code'.

So, I'm wondering a couple of things:

  • What is this "CentOS Stream CoreOS"? I can't find any downloads for it or references to it outside of a few obscure places.
  • Is there something else I should be doing to download the correct CoreOS? I used the ISO location provided in the installer output, but something seems to be amiss with that.

Version
4.17.0-okd-scos.0

How reproducible
100% of the time.

Log bundle
My cluster isn't even far enough up to produce this information:

$ ./openshift-install gather bootstrap --dir ./okd-sandbox/ --bootstrap 10.73.7.21
INFO Pulling Cluster API artifacts                
INFO Failed to gather Cluster API manifests: either Cluster API manifests not generated or terraform provision 
INFO Skipping VM console logs gather: no platform configured in metadata 
INFO Pulling debug logs from the bootstrap machine 
INFO Failed to gather bootstrap logs: failed to create SSH client: failed to use pre-existing agent, make sure the appropriate keys exist in the agent for authentication: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain 
FATAL failed to gather VM console and bootstrap logs 

Please let me what information I can provide.

@JaimeMagiera
Copy link
Contributor

Hi,

Here are a couple resources to help familiarize you with the current state of affairs:

https://okd.io/blog/2024/06/01/okd-future-statement/
https://okd.io/docs/project/scos-migration-faq/

In short, there are no publicly available SCOS artifacts at this time. You'll want to use the FCOS image as noted. I believe the workaround for the issue you're seeing is to install oc on the nodes. We'll write something up in the new year. Most of us are currently on holiday.

hope that helps.

Jaime

@necouchman
Copy link
Author

Thanks @JaimeMagiera,
I've also found discussion #2015, which seems to be related; however, the work-around described in that discussion has to do with either Assisted or Agent Install modes, so it isn't a 1:1 match with a User-Provisioned Infrastructure install. I'm able to locate the Live ISO for RHCOS, so I'm going to give that a try, I'll also install oc on those systems and see if that works.

@titou10titou10
Copy link

titou10titou10 commented Dec 26, 2024

has to do with either Assisted or Agent Install modes, so it isn't a 1:1 match with a User-Provisioned Infrastructure install.

There is some confusion here. "Assisted or Agent Install modes" is a way to install OKD with UPI..Like iPXE install (this requires an "extra" bootstrap node). ie you setup your nodes, then install OKD with ABI or AI or iPXE...
One of the major difference between "traditionnal" UPI install (ie PXE) it's that ABI/AI install does not require an extra one-usage bootstrap node with the associated iPXE infrastructure (web server for ign files etc..) and that ABI/AI UPI install is easier with air-gap environments
I'm the author of #2015, #2035 ...and always install OKD with ABI on proxmox..
Check okd-project/okd-web#47 also

@necouchman
Copy link
Author

Thanks @titou10titou10, appreciate the clarification on this. I'm a bit of a newbie to OKD, just getting my bearings, so I appreciate the guidance. I'm also install OKD on ProxMox, but I'm not using Assisted or Agent-Based installs, just generating the Ignition files and launching the installer myself.

@kai-uwe-rommel
Copy link

@necouchman when creating the UPI machines for the deployment, you need to disable secure boot for the machines. While FCOS can do secure boot, SCOS can't (missing shim signature).

@CrAazZyMaN21
Copy link

@necouchman I could install the cluster only with an older scos image (scos-413.9.202303020610-0-live.x86_64.iso) mentioned in (okd-project/okd-scos#11). I extraced the vmlinuz, initrd.img and rootfs.img from the ISO and pxebooted with the Ignition Files from openshift-install. I used a dedicated boostrap vm and the RHCOS or FCOS images didnt work at all with UPI.

@kai-uwe-rommel
Copy link

@CrAazZyMaN21 strange report. I wonder what you did so the FCOS base for UPI installation failed. That is the current official way of deploying OKD, even with 4.17.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants