This repository has been archived by the owner on Jan 24, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 42
SEP: Master cluster #72
Open
dwoz
wants to merge
4
commits into
saltstack:master
Choose a base branch
from
dwoz:master-cluster
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
- Feature Name: Master Cluster | ||
- Start Date: 2023-08-09 | ||
- SEP Status: Draft | ||
- SEP PR: (leave this empty) | ||
- Salt Issue: (leave this empty) | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
Add the ability to create a cluster of Masters that run behind a load balancer. | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
The current [high availability features](https://docs.saltproject.io/en/latest/topics/highavailability/index.html) in the Salt ecosystem allow minions to have back up masters. There are two flavors of Multi Master which can be configured on a Minion. | ||
|
||
Minions can connect to [multiple masters simultaneously](https://docs.saltproject.io/en/latest/topics/tutorials/multimaster.html). | ||
|
||
<img src='/diagrams/000-multi-master.png' width='400px'> | ||
|
||
Minions can also be configured to connect to one master at a time [using fail over](https://docs.saltproject.io/en/latest/topics/tutorials/multimaster_pki.html#multiple-masters-for-a-minion). | ||
|
||
<img src='/diagrams/000-multi-master-failover.png' width='400px'> | ||
|
||
This results in jobs targeting lots of minions being pinned to a single master. Another drawback to the current HA implementation is that minions need to be re-configured to add or remove masters. | ||
|
||
|
||
<img src='/diagrams/000-mm-large-job.png' width='400px'> | ||
|
||
It would be much more ideal if jobs could scale across multiple masters. | ||
|
||
|
||
<img src='/diagrams/000-mc-large-job.png' width='400px'> | ||
|
||
# Design | ||
[design]: #detailed-design | ||
|
||
In order to accomplish this, we will need to change the way jobs execute. | ||
Currently new jobs get sent directly to the publish server from the request | ||
server. | ||
|
||
<img src='/diagrams/000-current-job-pub.png' width='400px'> | ||
|
||
If we forward IPC Events between Masters, we can get the return flow to be shared, as shown below: | ||
|
||
|
||
<img src='/diagrams/000-cluster-job-pub.png' width='400px'> | ||
|
||
To get job publishes to work, we need to make sure publishes also travel over the IPC Event bus. | ||
|
||
|
||
<img src='/diagrams/000-cluster-fwd.png' width='400px'> | ||
|
||
Jobs can come and go through all the masters in our master pool. From a minion's perspective, all of the masters in our pool are completely the same. We can remove the need of minions to know about multiple masters by putting our pool behind a load balancer. Minions will not need to be re-configured to add master resources. | ||
|
||
|
||
<img src='/diagrams/000-cluster-arch.png' width='400px'> | ||
|
||
|
||
Events from master's including job returns are sent to all masters in the cluster. This requires that all masters run on stable local network. | ||
|
||
<img src="/diagrams/000-master-event-bus.png" width="400px"> | ||
|
||
> [!IMPORTANT] | ||
> The current work for this SEP can be found [here](https://github.com/saltstack/salt/pull/64936) | ||
|
||
### Cluster communication | ||
|
||
Each master in a cluster will retain it's own public/private keypair as well as it's own aes session key stored in memory on the master. In addition, a new `cluster_pki_dir` configuration options is added. The cluster will maintin a cluster wide public/private keypair and a cluster wide aes session key which will be used for minion communication. Each master in the cluster will publish a copy of it's public key in `<cluster_pki_dir>/peers`. Minion public keys will also be stored in `cluster_pki_dir` when in cluster mode. The same code use for master/minion configuration can be used to secure the master event bus. | ||
|
||
Master event bus communications will be secured using each masters' own keypair and aes sessions. There will be an addition of a new Channel `salt.channels.server` to handle the additional communications logic while leveraging our existing generic transports. This will also allow a cluster to leverage new transport (rabbitmq) or additional functionality (client tls certificates) as those feature become available. | ||
|
||
Communictiona with the minions connected to the cluster are secured with the cluster wide key pair and aes session key. | ||
|
||
|
||
## Alternatives | ||
[alternatives]: #alternatives | ||
|
||
We currently have two alternatives to achieve "high availablity". This is a | ||
third, more robust approach that alleviates the issues with the current | ||
options. This is not intending to deprecate the current HA functionality. | ||
|
||
|
||
## Unresolved questions | ||
[unresolved]: #unresolved-questions | ||
|
||
None as of this time. | ||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
The biggest drawback is the fact that we will need to maintain three ways of | ||
doing HA. This adds complexity however, if successfull. We can potentially | ||
depericate some of, or all of, the exiting HA functionality. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you go into more detail on the future of current HA methods with this in place. as well as the future of syndic? also any potential pitfalls to look at with things such as network latency. what kind of throughput will this require? what about split brain handling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This work is not deprecating any of the current HA functionality nor is it deprecating Syndic.
The network will need to be a reliable network and this is called out in the docs. If there is a split brain problem, the network is not reliable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By that definition, no network is reliable. That's why we need HA solutions in the first place.
We at least need to know which way it's going to fail during a network partition and not do something unsafe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as consistency and reliability. There is a huge difference between local networks and WAN networks. With this design, if a master goes offline for some reason. There is no failure. Any minion connections will be routed to a different master by the load balancer. The other masters will still try and forward events and you will see timeouts in the logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it isn't just about consistency and reliability. if the communication between masters CAN be broken and them not show as offline, it will happen. it needs to be documentation at the very least of what it looks like when it happens, I honestly don't think it will break much, as we don't do total bidirectional control. but it needs to be documented.
I can see this happening with the kind of engineer that loves segregating network traffic to separate lans. one network for minion communication, one network for storage, one network for master communication. then all of a sudden the network admin has a spanning tree go haywire in the master communication network. both masters will appear up to the minion and storage still works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both masters would not appear up to a minion because minions connect to the load balancer. I have not been able to break anything by taking masters offline. If you'd like to take the work for a spin and try and cause breakage please feel free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both masters would appear up to the load balancer too. The only connection that is broken in this scenario is master-master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the scenario described here you're salt cli would fail to receive events because they are not being forwarded from the disconnected master. There will be errors in the logs on the disconnected master that it's not able to forward it's events to the other master. The job would still finish correctly and the job cache would contain the correct results of the job.