-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New cluster using Talos is not progressing beyond Machines in Provisioning stage. #37
Comments
So after looking around and thinking a bit I see that our CAPHV is waiting for providerID to be set
I see that you are in your examples including the CPI as a DaemonSet that means that will not be setting the provideID on TalOS since it needs to be bootstraped before the DaemonSet would be started and the CPI setting providerID? IMHO the controller should be able to get the HarvesterMachine into a state so that the |
Hi @dhaugli, |
We are using Talos bootstrap and Talos controlplane provider in this case. |
I have followed the example now from templates, but still it dosent work, and I think I know why. Beacause the caph controller dosent propagate the IP address of the machines into the machine object like:
For reference, Vsphere capi controller does this, without this the TalosBoot controller can't see the ip and can't continue the bootstrap process. But my machines does get IP in my network and the qemu agent does report this through harvester. |
I found the issue with the CAPH controller, from the principles from Cluster API on how the bootstrap should work: CAPH controller does not set the machine as ready in the infrastructure provider (even though its running just fine as a VM in Harvester), because CAPH controller is waiting for Provider Id, and the LB is never created (because of this) and with Talos this will just make the nodes end up waiting forever in the bootstrap process, and will not progress. My friend Endre just made a fix in our own image, still dosen't work, but we are working on it as well. |
What happened:
[A clear and concise description of what the bug is.]
The cluster is not coming up, Harvester Loadbalancer is not created, machines never leave provisioning state.
The machines is provisioned in harvester, gets IP from my network. I can attach a console to them. Though its Talos so its not much you get in return.
Screenshot of console of one of the talos cp vms:
caph-provider logs:
capt-controller-manager logs:
cabpt-talos-bootstrap(I dont know if this is relevant):
What did you expect to happen:
I expected that the caph provider created the LB and proceeded on creating the cluster.
How to reproduce it:
I added the providers for talos (boostrap and controlplane) and ofcourse the harvester provider.
Added 4 files + the harvester secret with the following configuration:
cluster.yaml:
harvester-cluster.yaml:
harvester-machinetemplate.yaml:
controlplane.yaml:
Anything else you would like to add:
I have tried to switch the Loadbalancer config from dhcp to ipPoolRef, and set a pre-configured ippool this also did not work. I think its related to that the LB is never provisioned in the first place.
[Miscellaneous information that will assist in solving the issue.]
Environment:
/etc/os-release
):The text was updated successfully, but these errors were encountered: