Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator upgrade should not cause rolling restart #631

Merged
merged 6 commits into from
Apr 17, 2024

Conversation

burmanm
Copy link
Contributor

@burmanm burmanm commented Apr 8, 2024

What this PR does:
Missing two features: new Condition and the annotation name might require changing. And also some unit tests and not just the e2e test. The RestartTask changes should come in separate PR (and will get its own ticket).

  • Adds new reconcile event: annotation changes (and not just generation changes)
  • Do not upgrade StatefulSets if Generation hasn't changed
  • Modify upgrade_operator to run against 4.1 and verify in the operator upgrade test that the pods are not restarted on operator upgrade without added annotation approval. After the approval, ensure the pods were actually modified and have been restarted. Also, verify the "once" annotation was removed at the end.

Which issue(s) this PR fixes:
Fixes #566

Checklist

  • Changes manually tested
  • Automated Tests added/updated
  • Documentation added/updated
  • CHANGELOG.md updated (not required for documentation PRs)
  • CLA Signed: DataStax CLA

@burmanm burmanm changed the title Operator upgrade does not cause rolling restart Operator upgrade should not cause rolling restart Apr 8, 2024
@burmanm burmanm marked this pull request as ready for review April 9, 2024 14:39
@burmanm burmanm requested a review from a team as a code owner April 9, 2024 14:39
Copy link
Contributor

@olim7t olim7t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great so far. I tested an operator upgrade in a local cluster and everything behaves as expected.
I'll hold off approval for now since it seems things are still in the air regarding the config secret.

CHANGELOG.md Outdated
@@ -11,6 +11,7 @@ Changelog for Cass Operator, new PRs should update the `main / unreleased` secti

## unreleased

* [CHANGE] [#566](https://github.com/k8ssandra/cass-operator/issues/566) BREAKING: StatefulSets will no longer be updated if CassandraDatacenter is not modified, unless an annotation "cassandra.datastax.com/allow-upgrade" is set to the CassandraDatacenter with value "always" or "once". This means users of config secret should set this variable to "always" to keep their existing behavior. For other users, this means that the upgrades of operator will no longer automatically apply updated settings or system-logger image. The benefit is that updating the operator no longer causes the cluster to have a rolling restart. A new condition to indicate such change could be necessary is called "RequiresUpdate" and it will be set to True until the next refresh of reconcile has happened.
Copy link
Contributor

@olim7t olim7t Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

users of config secret should set this variable to "always" to keep their existing behavior

[thought] This feels a bit inconsistent. Whether the config is directly in the CassandraDatacenter or externalized in a secret is just a detail, overall it's the same semantics (as far as I know).

Is there any way that we could detect when a reconcile comes from a config secret change, and treat that the same as a regular CassDC modification?

EDIT -- just saw your comment in ReconciliationContext.UpdateAllowed() which I think goes in that direction so +1 nm I misread that comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReconciliationContext sadly doesn't know where the update came from as we don't pass the reconcile trigger inside the reconciliation process. That information is only available in the controller's process, where it gets Watch events and transforms those to Reconcile objects.

I don't think it's even easily possible to pass that information from there. For config secret we would need check some hashes to compare if it was changed from last time or something equivalent.

pkg/reconciliation/constructor.go Show resolved Hide resolved
Copy link
Contributor

@olim7t olim7t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving since the config secret issue can't be easily addressed.

Reminder to consider extracting constants for the annotation value, and one more comment below.

@@ -65,6 +65,9 @@ const (
// cluster has gone through scale up operation.
NoAutomatedCleanupAnnotation = "cassandra.datastax.com/no-cleanup"

// UpdateAllowedAnnotation marks the Datacenter to allow upgrades to StatefulSets even if CassandraDatacenter object was not modified. Allowed values are "once" and "always"
UpdateAllowedAnnotation = "cassandra.datastax.com/allow-upgrade"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[suggestion] I also think we should pick a better name for this annotation. allow-upgrade is too generic, it doesn't capture the fact that we are only considering the case when the object was not modified.

Maybe something along the lines of allow-changes-on-operator-upgrade?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it to cassandra.datastax.com/autoupdate-spec which seemed to be close to what many Kubernetes "known annotations" seem to follow, such as rbac.authorization.kubernetes.io/autoupdate and apf.kubernetes.io/autoupdate-spec, since their behavior is quite close to what we want.

I don't want this to be "upgrade of operator" only annotation, but instead dedicate the same behavior to all StatefulSet updates the operator might make automatically, such as removing user-made changes to the StatefulSets.

Copy link
Contributor

@olim7t olim7t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The task looks good, I've verified that it sets the annotation to once on a DC that didn't have it, and completes immediately on a DC that has always.

burmanm added 6 commits April 17, 2024 18:25
…on and there is no annotation with allow-upgrade set. Add new event for reconcile, any change of annotations in the CassandraDatacenter
…or allow-upgrade and add RequiresUpdate Condition to the Datacenter.Status
…cate any automated change of StatefulSet spec
@burmanm burmanm force-pushed the operator_upgrade_safety branch from 2783323 to 322a576 Compare April 17, 2024 15:25
@burmanm burmanm merged commit e598ce3 into k8ssandra:master Apr 17, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Operator upgrade should not cause a rolling restart
2 participants