Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support PodDisruptionBudgets #477

Merged
merged 5 commits into from
Oct 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,19 @@ All notable changes to this project will be documented in this file.

- Default stackableVersion to operator version ([#458]).
- Configuration overrides for the JVM security properties, such as DNS caching ([#464]).
- Support PodDisruptionBudgets ([#477]).

### Changed

- `vector` `0.26.0` -> `0.31.0` ([#459]).
- `operator-rs` `0.44.0` -> `0.51.1` ([#458], [#474]).
- `operator-rs` `0.44.0` -> `0.52.1` ([#458], [#474], [#477]).
- Let secret-operator handle certificate conversion ([#474]).

[#458]: https://github.com/stackabletech/druid-operator/pull/458
sbernauer marked this conversation as resolved.
Show resolved Hide resolved
[#459]: https://github.com/stackabletech/druid-operator/pull/459
[#464]: https://github.com/stackabletech/druid-operator/pull/464
[#474]: https://github.com/stackabletech/druid-operator/pull/474
[#477]: https://github.com/stackabletech/druid-operator/pull/477

## [23.7.0] - 2023-07-14

Expand Down
8 changes: 4 additions & 4 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
serde_yaml = "0.9"
snafu = "0.7"
stackable-operator = { git = "https://github.com/stackabletech/operator-rs.git", tag = "0.51.1" }
stackable-operator = { git = "https://github.com/stackabletech/operator-rs.git", tag = "0.52.1" }
strum = { version = "0.25", features = ["derive"] }
tokio = { version = "1.29", features = ["full"] }
tracing = "0.1"
Expand Down
10 changes: 10 additions & 0 deletions deploy/helm/druid-operator/crds/crds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3434,11 +3434,13 @@ spec:
podDisruptionBudget:
enabled: true
maxUnavailable: null
description: This is a product-agnostic RoleConfig, which is sufficient for most of the products.
properties:
podDisruptionBudget:
default:
enabled: true
maxUnavailable: null
description: 'This struct is used to configure: 1.) If PodDisruptionBudgets are created by the operator 2.) The allowed number of Pods to be unavailable (`maxUnavailable`)'
properties:
enabled:
default: true
Expand Down Expand Up @@ -11752,11 +11754,13 @@ spec:
podDisruptionBudget:
enabled: true
maxUnavailable: null
description: This is a product-agnostic RoleConfig, which is sufficient for most of the products.
properties:
podDisruptionBudget:
default:
enabled: true
maxUnavailable: null
description: 'This struct is used to configure: 1.) If PodDisruptionBudgets are created by the operator 2.) The allowed number of Pods to be unavailable (`maxUnavailable`)'
properties:
enabled:
default: true
Expand Down Expand Up @@ -18653,11 +18657,13 @@ spec:
podDisruptionBudget:
enabled: true
maxUnavailable: null
description: This is a product-agnostic RoleConfig, which is sufficient for most of the products.
properties:
podDisruptionBudget:
default:
enabled: true
maxUnavailable: null
description: 'This struct is used to configure: 1.) If PodDisruptionBudgets are created by the operator 2.) The allowed number of Pods to be unavailable (`maxUnavailable`)'
properties:
enabled:
default: true
Expand Down Expand Up @@ -25597,11 +25603,13 @@ spec:
podDisruptionBudget:
enabled: true
maxUnavailable: null
description: This is a product-agnostic RoleConfig, which is sufficient for most of the products.
properties:
podDisruptionBudget:
default:
enabled: true
maxUnavailable: null
description: 'This struct is used to configure: 1.) If PodDisruptionBudgets are created by the operator 2.) The allowed number of Pods to be unavailable (`maxUnavailable`)'
properties:
enabled:
default: true
Expand Down Expand Up @@ -32471,11 +32479,13 @@ spec:
podDisruptionBudget:
enabled: true
maxUnavailable: null
description: This is a product-agnostic RoleConfig, which is sufficient for most of the products.
properties:
podDisruptionBudget:
default:
enabled: true
maxUnavailable: null
description: 'This struct is used to configure: 1.) If PodDisruptionBudgets are created by the operator 2.) The allowed number of Pods to be unavailable (`maxUnavailable`)'
properties:
enabled:
default: true
Expand Down
13 changes: 13 additions & 0 deletions deploy/helm/druid-operator/templates/roles.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,19 @@ rules:
- jobs
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- policy
resources:
- poddisruptionbudgets
verbs:
- create
- delete
- get
- list
- patch
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
= Cluster operation

= Cluster Operation

Druid installations can be configured with different cluster operations like pausing reconciliation or stopping the cluster. See xref:concepts:cluster_operations.adoc[cluster operations] for more details.
Druid installations can be configured with different cluster operations like pausing reconciliation or stopping the cluster. See xref:concepts:operations/cluster_operations.adoc[cluster operations] for more details.
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
= Graceful shutdown

Graceful shutdown of Druid nodes is either not supported by the product itself
or we have not implemented it yet.

Outstanding implementation work for the graceful shutdowns of all products where this functionality is relevant is tracked in
https://github.com/stackabletech/issues/issues/357
5 changes: 5 additions & 0 deletions docs/modules/druid/pages/usage-guide/operations/index.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
= Operations

This section of the documentation is intended for the operations teams that maintain a Stackable Data Platform installation.

Please read the xref:concepts:operations/index.adoc[Concepts page on Operations] that contains the necessary details to operate the platform in a production environment.
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
= Allowed Pod disruptions

You can configure the permitted Pod disruptions for Druid nodes as described in xref:concepts:operations/pod_disruptions.adoc[].

Unless you configure something else or disable our PodDisruptionBudgets (PDBs), we write the following PDBs:

== Brokers
We only allow a single broker to be offline at any given time, regardless of the number of replicas or `roleGroups`.

== Coordinators
We only allow a single coordinator to be offline at any given time, regardless of the number of replicas or `roleGroups`.

== Historicals
We only allow a single historical to be offline at any given time, regardless of the number of replicas or `roleGroups`.

== MiddleManagers
We only allow a single middleManager to be offline at any given time, regardless of the number of replicas or `roleGroups`.

== Routers
We only allow a single Router to be offline at any given time, regardless of the number of replicas or `roleGroups`.
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
= Pod placement

You can configure the Pod placement of the Druid pods as described in xref:concepts:pod_placement.adoc[].
You can configure the Pod placement of the Druid pods as described in xref:concepts:operations/pod_placement.adoc[].

The default affinities created by the operator are:

Expand Down
12 changes: 7 additions & 5 deletions docs/modules/druid/partials/nav.adoc
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
* xref:druid:getting_started/index.adoc[]
** xref:druid:getting_started/installation.adoc[]
** xref:druid:getting_started/first_steps.adoc[]
* xref:druid:configuration.adoc[]
* xref:druid:required-external-components.adoc[]
* xref:druid:usage-guide/index.adoc[]
** xref:druid:usage-guide/pod-placement.adoc[]
** xref:druid:usage-guide/listenerclass.adoc[]
** xref:druid:usage-guide/ingestion.adoc[]
** xref:druid:usage-guide/deep-storage.adoc[]
Expand All @@ -12,7 +13,8 @@
** xref:druid:usage-guide/logging.adoc[]
** xref:druid:usage-guide/monitoring.adoc[]
** xref:druid:usage-guide/configuration-and-environment-overrides.adoc[]
** xref:druid:usage-guide/cluster_operations.adoc[]
* xref:druid:required-external-components.adoc[]
* xref:druid:configuration.adoc[]

** xref:druid:usage-guide/operations/index.adoc[]
*** xref:druid:usage-guide/operations/cluster-operations.adoc[]
*** xref:druid:usage-guide/operations/pod-placement.adoc[]
*** xref:druid:usage-guide/operations/pod-disruptions.adoc[]
*** xref:druid:usage-guide/operations/graceful-shutdown.adoc[]
12 changes: 11 additions & 1 deletion rust/crd/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ use stackable_operator::{
product_config::types::PropertyNameKind,
product_config_utils::{ConfigError, Configuration},
product_logging::{self, spec::Logging},
role_utils::{CommonConfiguration, Role, RoleGroup},
role_utils::{CommonConfiguration, GenericRoleConfig, Role, RoleGroup},
schemars::{self, JsonSchema},
status::condition::{ClusterCondition, HasStatusCondition},
};
Expand Down Expand Up @@ -806,6 +806,16 @@ impl DruidCluster {
})
}

pub fn role_config(&self, role: &DruidRole) -> &GenericRoleConfig {
match role {
DruidRole::Broker => &self.spec.brokers.role_config,
DruidRole::Coordinator => &self.spec.coordinators.role_config,
DruidRole::Historical => &self.spec.historicals.role_config,
DruidRole::MiddleManager => &self.spec.middle_managers.role_config,
DruidRole::Router => &self.spec.routers.role_config,
}
}

/// Merges and validates the given role group, role, and default configurations
pub fn merged_rolegroup_config<T>(
rolegroup_config: &T::Fragment,
Expand Down
4 changes: 2 additions & 2 deletions rust/operator-binary/src/discovery.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
//! Discovery for Druid. We make Druid discoverable by putting a connection string to the router service
//! inside a config map. We only provide a connection string to the router service, since it serves as
//! a gateway to the cluster for client queries.
use crate::CONTROLLER_NAME;
use crate::DRUID_CONTROLLER_NAME;

use snafu::{OptionExt, ResultExt, Snafu};
use stackable_druid_crd::{
Expand Down Expand Up @@ -82,7 +82,7 @@ fn build_discovery_configmap(
})?
.with_recommended_labels(build_recommended_labels(
druid,
CONTROLLER_NAME,
DRUID_CONTROLLER_NAME,
&resolved_product_image.app_version_label,
&DruidRole::Router.to_string(),
"discovery",
Expand Down
33 changes: 25 additions & 8 deletions rust/operator-binary/src/druid_controller.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ use crate::{
internal_secret::{
build_shared_internal_secret_name, create_shared_internal_secret, env_var_from_secret,
},
operations::pdb::add_pdbs,
product_logging::{extend_role_group_config_map, resolve_vector_aggregator_address},
OPERATOR_NAME,
};
Expand Down Expand Up @@ -77,7 +78,7 @@ use std::{
};
use strum::{EnumDiscriminants, IntoStaticStr};

pub const CONTROLLER_NAME: &str = "druidcluster";
pub const DRUID_CONTROLLER_NAME: &str = "druidcluster";

const DRUID_UID: i64 = 1000;
const DOCKER_IMAGE_BASE_NAME: &str = "druid";
Expand Down Expand Up @@ -255,6 +256,10 @@ pub enum Error {
source: stackable_operator::product_config::writer::PropertiesWriterError,
rolegroup: String,
},
#[snafu(display("failed to create PodDisruptionBudget"))]
FailedToCreatePdb {
source: crate::operations::pdb::Error,
},
}

type Result<T, E = Error> = std::result::Result<T, E>;
Expand Down Expand Up @@ -356,7 +361,7 @@ pub async fn reconcile_druid(druid: Arc<DruidCluster>, ctx: Arc<Ctx>) -> Result<
let mut cluster_resources = ClusterResources::new(
APP_NAME,
OPERATOR_NAME,
CONTROLLER_NAME,
DRUID_CONTROLLER_NAME,
&druid.object_ref(&()),
ClusterResourceApplyStrategy::from(&druid.spec.cluster_operation),
)
Expand Down Expand Up @@ -397,7 +402,7 @@ pub async fn reconcile_druid(druid: Arc<DruidCluster>, ctx: Arc<Ctx>) -> Result<
.await
.context(ApplyRoleServiceSnafu)?;

create_shared_internal_secret(&druid, client, CONTROLLER_NAME)
create_shared_internal_secret(&druid, client, DRUID_CONTROLLER_NAME)
.await
.context(FailedInternalSecretCreationSnafu)?;

Expand Down Expand Up @@ -464,6 +469,18 @@ pub async fn reconcile_druid(druid: Arc<DruidCluster>, ctx: Arc<Ctx>) -> Result<
})?,
);
}

let role_config = druid.role_config(&druid_role);

add_pdbs(
&role_config.pod_disruption_budget,
&druid,
&druid_role,
client,
&mut cluster_resources,
)
.await
.context(FailedToCreatePdbSnafu)?;
}

// discovery
Expand Down Expand Up @@ -526,7 +543,7 @@ pub fn build_role_service(
.context(ObjectMissingMetadataForOwnerRefSnafu)?
.with_recommended_labels(build_recommended_labels(
druid,
CONTROLLER_NAME,
DRUID_CONTROLLER_NAME,
&resolved_product_image.app_version_label,
&role_name,
"global",
Expand Down Expand Up @@ -661,7 +678,7 @@ fn build_rolegroup_config_map(
.context(ObjectMissingMetadataForOwnerRefSnafu)?
.with_recommended_labels(build_recommended_labels(
druid,
CONTROLLER_NAME,
DRUID_CONTROLLER_NAME,
&resolved_product_image.app_version_label,
&rolegroup.role,
&rolegroup.role_group,
Expand Down Expand Up @@ -727,7 +744,7 @@ fn build_rolegroup_services(
.context(ObjectMissingMetadataForOwnerRefSnafu)?
.with_recommended_labels(build_recommended_labels(
druid,
CONTROLLER_NAME,
DRUID_CONTROLLER_NAME,
&resolved_product_image.app_version_label,
&rolegroup.role,
&rolegroup.role_group,
Expand Down Expand Up @@ -927,7 +944,7 @@ fn build_rolegroup_statefulset(
.metadata_builder(|m| {
m.with_recommended_labels(build_recommended_labels(
druid,
CONTROLLER_NAME,
DRUID_CONTROLLER_NAME,
&resolved_product_image.app_version_label,
&rolegroup_ref.role,
&rolegroup_ref.role_group,
Expand Down Expand Up @@ -975,7 +992,7 @@ fn build_rolegroup_statefulset(
.context(ObjectMissingMetadataForOwnerRefSnafu)?
.with_recommended_labels(build_recommended_labels(
druid,
CONTROLLER_NAME,
DRUID_CONTROLLER_NAME,
&resolved_product_image.app_version_label,
&rolegroup_ref.role,
&rolegroup_ref.role_group,
Expand Down
5 changes: 3 additions & 2 deletions rust/operator-binary/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,12 @@ mod discovery;
mod druid_controller;
mod extensions;
mod internal_secret;
mod operations;
mod product_logging;

use std::sync::Arc;

use crate::druid_controller::CONTROLLER_NAME;
use crate::druid_controller::DRUID_CONTROLLER_NAME;
use clap::{crate_description, crate_version, Parser};
use futures::StreamExt;
use stackable_druid_crd::{DruidCluster, APP_NAME, OPERATOR_NAME};
Expand Down Expand Up @@ -94,7 +95,7 @@ async fn main() -> anyhow::Result<()> {
.map(|res| {
report_controller_reconciled(
&client,
&format!("{CONTROLLER_NAME}.{OPERATOR_NAME}"),
&format!("{DRUID_CONTROLLER_NAME}.{OPERATOR_NAME}"),
&res,
)
})
Expand Down
1 change: 1 addition & 0 deletions rust/operator-binary/src/operations/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
pub mod pdb;
Loading