-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Operator upgrade should not cause rolling restart #631
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great so far. I tested an operator upgrade in a local cluster and everything behaves as expected.
I'll hold off approval for now since it seems things are still in the air regarding the config secret.
CHANGELOG.md
Outdated
@@ -11,6 +11,7 @@ Changelog for Cass Operator, new PRs should update the `main / unreleased` secti | |||
|
|||
## unreleased | |||
|
|||
* [CHANGE] [#566](https://github.com/k8ssandra/cass-operator/issues/566) BREAKING: StatefulSets will no longer be updated if CassandraDatacenter is not modified, unless an annotation "cassandra.datastax.com/allow-upgrade" is set to the CassandraDatacenter with value "always" or "once". This means users of config secret should set this variable to "always" to keep their existing behavior. For other users, this means that the upgrades of operator will no longer automatically apply updated settings or system-logger image. The benefit is that updating the operator no longer causes the cluster to have a rolling restart. A new condition to indicate such change could be necessary is called "RequiresUpdate" and it will be set to True until the next refresh of reconcile has happened. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
users of config secret should set this variable to "always" to keep their existing behavior
[thought] This feels a bit inconsistent. Whether the config is directly in the CassandraDatacenter or externalized in a secret is just a detail, overall it's the same semantics (as far as I know).
Is there any way that we could detect when a reconcile comes from a config secret change, and treat that the same as a regular CassDC modification?
EDIT -- just saw your comment in nm I misread that commentReconciliationContext.UpdateAllowed()
which I think goes in that direction so +1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReconciliationContext sadly doesn't know where the update came from as we don't pass the reconcile trigger inside the reconciliation process. That information is only available in the controller's process, where it gets Watch events and transforms those to Reconcile objects.
I don't think it's even easily possible to pass that information from there. For config secret we would need check some hashes to compare if it was changed from last time or something equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving since the config secret issue can't be easily addressed.
Reminder to consider extracting constants for the annotation value, and one more comment below.
@@ -65,6 +65,9 @@ const ( | |||
// cluster has gone through scale up operation. | |||
NoAutomatedCleanupAnnotation = "cassandra.datastax.com/no-cleanup" | |||
|
|||
// UpdateAllowedAnnotation marks the Datacenter to allow upgrades to StatefulSets even if CassandraDatacenter object was not modified. Allowed values are "once" and "always" | |||
UpdateAllowedAnnotation = "cassandra.datastax.com/allow-upgrade" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[suggestion] I also think we should pick a better name for this annotation. allow-upgrade
is too generic, it doesn't capture the fact that we are only considering the case when the object was not modified.
Maybe something along the lines of allow-changes-on-operator-upgrade
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed it to cassandra.datastax.com/autoupdate-spec
which seemed to be close to what many Kubernetes "known annotations" seem to follow, such as rbac.authorization.kubernetes.io/autoupdate
and apf.kubernetes.io/autoupdate-spec
, since their behavior is quite close to what we want.
I don't want this to be "upgrade of operator" only annotation, but instead dedicate the same behavior to all StatefulSet updates the operator might make automatically, such as removing user-made changes to the StatefulSets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The task looks good, I've verified that it sets the annotation to once
on a DC that didn't have it, and completes immediately on a DC that has always
.
…to restart the StatefulSets
…on and there is no annotation with allow-upgrade set. Add new event for reconcile, any change of annotations in the CassandraDatacenter
…or allow-upgrade and add RequiresUpdate Condition to the Datacenter.Status
…et be set to "always"
…cate any automated change of StatefulSet spec
2783323
to
322a576
Compare
What this PR does:
Missing two features: new Condition and the annotation name might require changing. And also some unit tests and not just the e2e test. The RestartTask changes should come in separate PR (and will get its own ticket).
Which issue(s) this PR fixes:
Fixes #566
Checklist