K8SSAND-1042 ⁃ Feature request: Add option to allow all pods to start in parallel #230

rchernobelskiy · 2021-11-08T21:26:00Z

Currently, when resuming a stopped cluster, all the cassandra pods start up sequentially because the ips for the pods change and cassandra can only join one node at a time.

When using static IPs however, there is no concern about the IPs changing and therefore all the pods can start up in parallel.

An option to start all pods in parallel will significantly reduce the time to resume a large stopped cluster.

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-1042
┆priority: Medium

jimdickinson · 2021-11-08T23:34:48Z

I think we'll have to commit something here so that we can toggle on an implementation using static IPs so that this feature can be tested?

bradfordcp · 2022-04-12T04:59:54Z

I'm curious how we could detect if the cluster is using static IPs or not. Just a boolean in the spec? I assume there is a sidecar or something that handles setting up the appropriate addresses and routing.

rchernobelskiy · 2022-04-12T14:12:14Z

I'm curious how we could detect if the cluster is using static IPs or not. Just a boolean in the spec? I assume there is a sidecar or something that handles setting up the appropriate addresses and routing.

Yep that's what I was thinking, something like parallelResume: true. And yeah, a sidecar is handling the IP and route configuration.

Alternatively, we could add a flag something like useVirtualNetwork: true, and this would, in addition to starting pods in parallel, add the sidecars that enable the virtual network. Though this kind of addition to the operator would be somewhat more involved.

jsanda · 2022-04-12T16:00:57Z

Let me ask the obvious, What are the risks of starting in parallel if static IPs are not used?

jsanda · 2022-04-19T22:18:23Z

Please add your planning poker estimate with ZenHub @burmanm

bradfordcp · 2022-04-19T22:18:59Z

I assume this would fall under the spec.networking key.

bradfordcp · 2022-04-20T13:38:06Z

Do we still need to start seed nodes first before parallel starting the rest of the nodes?

adejanovski · 2022-06-13T09:46:58Z

Do we still need to start seed nodes first before parallel starting the rest of the nodes?

If we start the seed nodes first (one by one), it should allow us to start other nodes in parallel even if we're not using static IPs. These nodes will then be able to connect to the cluster through the seeds and broadcast their new IP address.
The scenario that Cassandra doesn't deal well with is concurrent range movements, which will not be the case here.

adejanovski · 2022-06-13T09:48:03Z

@bradfordcp, can we move the ticket to the product backlog or does it require a design session?

burmanm · 2024-03-05T08:14:30Z

@rchernobelskiy Is this still necessary feature?

rchernobelskiy · 2024-03-05T13:19:49Z

From my personal perspective I still believe it would be a good feature to have.

adejanovski · 2024-06-25T14:39:30Z

I agree, there have been multiple incidents that were due to nodes which are already part of the ring being blocked from starting by cass-operator because another node was bootstrapping (which can take a while).

What we need to identify is if a node had previously bootstrapped, and allow it to start concurrently with other nodes in that case if we have at least one available seed node.
We should detail this process a little bit to more precisely list the conditions that need to be met to enable this behavior.

burmanm · 2024-07-11T09:06:25Z

Solved in #673

rchernobelskiy · 2024-07-11T14:09:46Z

Nice, cc @berndocklin and @Liwanshi we should look at adding this to Astra, it'll significantly reduce the time to resume a large stopped cluster.

rchernobelskiy added the enhancement New feature or request label Nov 8, 2021

sync-by-unito bot changed the title ~~Feature request: Add option to allow all pods to start in parallel~~ K8SSAND-1042 ⁃ Feature request: Add option to allow all pods to start in parallel Nov 8, 2021

adejanovski added the zh:Assess/Investigate label Apr 12, 2022

adejanovski added zh:Assess/Investigate and removed zh:Assess/Investigate labels Aug 30, 2022

adejanovski moved this to Assess/Investigate in K8ssandra Nov 8, 2022

adejanovski added this to K8ssandra Nov 8, 2022

adejanovski added the assess Issues in the state 'assess' label Mar 5, 2024

burmanm moved this from Assess/Investigate to In Progress in K8ssandra Jul 9, 2024

adejanovski added in-progress Issues in the state 'in-progress' and removed assess Issues in the state 'assess' labels Jul 9, 2024

burmanm self-assigned this Jul 9, 2024

burmanm closed this as completed Jul 11, 2024

github-project-automation bot moved this from In Progress to Done in K8ssandra Jul 11, 2024

adejanovski added done Issues in the state 'done' and removed in-progress Issues in the state 'in-progress' labels Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K8SSAND-1042 ⁃ Feature request: Add option to allow all pods to start in parallel #230

K8SSAND-1042 ⁃ Feature request: Add option to allow all pods to start in parallel #230

rchernobelskiy commented Nov 8, 2021 •

edited by sync-by-unito bot

Loading

jimdickinson commented Nov 8, 2021

bradfordcp commented Apr 12, 2022

rchernobelskiy commented Apr 12, 2022 •

edited

Loading

jsanda commented Apr 12, 2022

jsanda commented Apr 19, 2022

bradfordcp commented Apr 19, 2022

bradfordcp commented Apr 20, 2022 •

edited

Loading

adejanovski commented Jun 13, 2022

adejanovski commented Jun 13, 2022

burmanm commented Mar 5, 2024

rchernobelskiy commented Mar 5, 2024

adejanovski commented Jun 25, 2024

burmanm commented Jul 11, 2024

rchernobelskiy commented Jul 11, 2024

K8SSAND-1042 ⁃ Feature request: Add option to allow all pods to start in parallel #230

K8SSAND-1042 ⁃ Feature request: Add option to allow all pods to start in parallel #230

Comments

rchernobelskiy commented Nov 8, 2021 • edited by sync-by-unito bot Loading

jimdickinson commented Nov 8, 2021

bradfordcp commented Apr 12, 2022

rchernobelskiy commented Apr 12, 2022 • edited Loading

jsanda commented Apr 12, 2022

jsanda commented Apr 19, 2022

bradfordcp commented Apr 19, 2022

bradfordcp commented Apr 20, 2022 • edited Loading

adejanovski commented Jun 13, 2022

adejanovski commented Jun 13, 2022

burmanm commented Mar 5, 2024

rchernobelskiy commented Mar 5, 2024

adejanovski commented Jun 25, 2024

burmanm commented Jul 11, 2024

rchernobelskiy commented Jul 11, 2024

rchernobelskiy commented Nov 8, 2021 •

edited by sync-by-unito bot

Loading

rchernobelskiy commented Apr 12, 2022 •

edited

Loading

bradfordcp commented Apr 20, 2022 •

edited

Loading