Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we support use storage size equal or greater than 1024Gi? #596

Open
sherryhao opened this issue Nov 13, 2023 · 6 comments
Open

Do we support use storage size equal or greater than 1024Gi? #596

sherryhao opened this issue Nov 13, 2023 · 6 comments
Labels
assess Issues in the state 'assess' bug Something isn't working

Comments

@sherryhao
Copy link

sherryhao commented Nov 13, 2023

What happened?

Hi experts,

We are continously seeing this kind of error with storage size set to 1024Gi when deploying CassandraDatacenter.

"admission webhook "vcassandradatacenter.kb.io" denied the request: CassandraDatacenter write rejected, attempted to change storageConfig"

1699876353185

image

storageConfig:
cassandraDataVolumeClaimSpec:

reclaimPolicy: Retain

  storageClassName: managed-csi-premium
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1024Gi

But, once we reduced the size to a smaller one, say 500Gi, then services come up as expected with all required pvc created successfully.

Would you kindly suggest if there is any limation to the storage size?

Install command: helm install cass-operator k8ssandra/cass-operator -n cass-operator--version=0.40.0

Cluster is running on AKS+managed-csi-premium

What did you expect to happen?

Support storage size equal or greater than 1024Gi.

How can we reproduce it (as minimally and precisely as possible)?

Cluster running on AKS and deploy CassandraDatacenter with storage size 1024Gi.

sample configuration:
apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
name: cassandra
namespace: cass-operator
spec:
clusterName: xxxx
serverType: cassandra
serverVersion: "4.0.7"
managementApiAuth:
insecure: {}
size: 3
allowMultipleNodesPerWorker: true
resources:
requests:
cpu: 500m
memory: 24Gi
limits:
cpu: 2
memory: 24Gi
storageConfig:
cassandraDataVolumeClaimSpec:

reclaimPolicy: Retain

  storageClassName: managed-csi-premium
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 500Gi

cass-operator version

0.40.0

Kubernetes version

v1.25

Method of installation

helm

Anything else we need to know?

No response

┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: CASS-15

@sherryhao sherryhao added the bug Something isn't working label Nov 13, 2023
@burmanm
Copy link
Contributor

burmanm commented Nov 13, 2023

The size does not seem to be the issue in other ways than Kubernetes trying to update the Spec. So, from what I can see in the logs is that after you've created 1024Gi sized resourceQuantity, the Kubernetes will do an update to 1Ti. Thus, we detect that as a modification (it's an update to the PVC size which StS does not allow) and then reject it.

If you wish to overcome this issue, simply write the size as 1Ti and it works fine.

@burmanm
Copy link
Contributor

burmanm commented Nov 13, 2023

Here's the relevant part from my debug logs:

2023-11-13T12:41:08.259Z        INFO    api     Validating webhook called for update
2023-11-13T12:41:08.259Z        INFO    api     differences in storageConfigs: &PersistentVolumeClaimSpec{AccessModes:[ReadWriteOnce],Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{storage: {{1099511627776 0} {<nil>}  BinarySI},},Claims:[]ResourceClaim{},},VolumeName:,Selector:nil,StorageClassName:*standard,VolumeMode:nil,DataSource:nil,DataSourceRef:nil,} vs &PersistentVolumeClaimSpec{AccessModes:[ReadWriteOnce],Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{storage: {{1099511627776 0} {<nil>} 1Ti BinarySI},},Claims:[]ResourceClaim{},},VolumeName:,Selector:nil,StorageClassName:*standard,VolumeMode:nil,DataSource:nil,DataSourceRef:nil,}

The 1024Gi is originally output to the server as 1099511627776 with unit BinarySI and then something (Kubernetes / some controller etc*) calls an update to it to replace it with 1Ti instead.

This is probably something we could detect, but for now, setting it directly to Ti works as a workaround.

  • the controller is technically of course cass-operator, as it adds Finalizer with Client.Update, but the type change is because of serialization libraries used by the client-go / controller-runtime client.

@sherryhao
Copy link
Author

Thank you Michael~ that's really helps. Will find a maintenance Windows to do this.
Also, that means we can't not update the storage size after deployment?

@burmanm
Copy link
Contributor

burmanm commented Nov 14, 2023

Hey, that's sadly a limitation of StatefulSets (and some StorageClasses), see #263 for more details.

@sherryhao
Copy link
Author

Michael, got it. Thanks for the explanation. Just one more question. If we want to specify 1.5T, what kind of value is suggested. 1.5Ti or 1536Gi?

@burmanm
Copy link
Contributor

burmanm commented Nov 14, 2023

Sadly there I think the Gi is the correct method..

➜  cass-operator git:(master) ✗ kubectl get pvc
NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
server-data-cluster2-dc2-r1-sts-0   Bound    pvc-9d5f342a-1898-42cb-8b0f-748aa9d554a8   1536Gi     RWO            standard       6s
➜  cass-operator git:(master) ✗

@burmanm burmanm moved this to Assess/Investigate in K8ssandra Dec 19, 2023
@adejanovski adejanovski added the assess Issues in the state 'assess' label Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assess Issues in the state 'assess' bug Something isn't working
Projects
No open projects
Status: Assess/Investigate
Development

No branches or pull requests

3 participants