Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Downstreamcluster not deploying when using subfolders in vpshere #1415

Open
Suschio opened this issue Sep 23, 2024 · 2 comments
Open
Labels

Comments

@Suschio
Copy link

Suschio commented Sep 23, 2024

Rancher Server Setup

  • Rancher version:
    2.9.1
  • Installation option (Docker install/Helm Chart):
    Helm Chart
    • If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc):
      RKE2

Information about the Cluster

  • Kubernetes version:
    v1.30.4+rke2r1
  • Cluster Type (Local/Downstream):
    Downstream
    • If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider):
      Vsphere

User Information

  • What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom)
    • If custom, define the set of permissions:
      Admin

Provider Information

  • What is the version of the Rancher v2 Terraform Provider in use?
    5.0.0
  • What is the version of Terraform in use?
    1.9.5

Describe the bug

If we use subfolder structure for vpshere_folder (DC/vm/kubernetes/testkubernetes) then the downstream cluster will get stuck in a loop of

"Waiting for all etcd machines to be deleted"
"Wating for init node"

Looking into the config in the RancherUI we can see an error with
"pool1: The provided value for folder was not found in the list of expected values. This can happen with clusters provisioned outside of Rancher or when options for the provider have changed."
But all configuration are correct and the category folder has an entry with the correct path.

If we create the same cluster with RancherUI and choose the path it will succeed without problems.
And if we create the same cluster with terraform and but with only one folder (DC/vm/kubernetes ) it works too.

The provision pod just shows this error
error loading host test-xxx Docker machine "test-xxx" does not exist. Use "docker-machine ls" to list machines. Use "docker-machine create" to add a new one.

To Reproduce

Use Rancher2 provider with the ressources:
machine_config_v2
[vsphere_config]
folder = "DC/vm/folder/somesubfolder
and
rancher2_cluster_v2 Resource

Actual Result

Cluster not deploying and stuck in a loop with
"Waiting for all etcd machines to be deleted"
"Wating for init node"

with config error

"pool1: The provided value for folder was not found in the list of expected values. This can happen with clusters provisioned outside of Rancher or when options for the provider have changed."

Expected Result

We expect a working downstream cluster that is deployed with terraform and can use subfolders in vsphere.

@niklas-letz
Copy link

niklas-letz commented Oct 1, 2024

Same problem! Help would be very much appreciated!
Similar issue: [BUG] hostsystem error for cluster provisioned with vsphere provider #10460

@Suschio
Copy link
Author

Suschio commented Oct 2, 2024

After some research we found that if you pass /DC/vm/folder as folder variable in rancher2_machine_config_v2 Resource.
The VM gets created without issues. But if you add a folder like /DC/vm/folder/folder we get an error in
vmwarevspheremachines.rke-machine.cattle.io

folder: /DC-xxxx/vm/xxxxxx/xxxxxx
  - message: |-
      failed creating server [fleet-default/xxx-test-cluster-master-xxxx-xxxx] of kind (VmwarevsphereMachine) for machine xxxx-test-cluster-master-xxxxx-xxxin infrastructure provider: CreateError: Running pre-create checks...
      (xxxx-test-cluster-master-xxxx-xxxx) Connecting to vSphere for pre-create checks...
      (xxxx-test-cluster-master-xxxx-xxxx) Using datacenter /DC-XX
      (xxxx-test-cluster-master-xxxx-xxxx) Using network /DC-XXX/network/xxxx
      (xxxx-test-cluster-master-xxxx-xxxx) Using ResourcePool /DC-XXX/host/XXXX/Resources
      Error with pre-create check: "folder '/DC-XXXX/vm/DC-XXXX/vm/xxxxx/xxxxxx' not found"
    reason: CreateError
    status: "False"

The Folder gets passed correctly but then somehow gets "doubled" in the pre-create check.
If we declare the folder variable without /DC/vm/ it works but we still have an error in the UI of this cluster with
"pool1: The provided value for folder was not found in the list of expected values. This can happen with clusters provisioned outside of Rancher or when options for the provider have changed."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants