Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deploy: support for read affinity options per cluster #4165

Conversation

iPraveenParihar
Copy link
Contributor

@iPraveenParihar iPraveenParihar commented Oct 6, 2023

Describe what this PR does

  • deploy: support for read affinity options per cluster
    Implemented the capability to include read affinity options
    for individual clusters within the ceph-csi-config ConfigMap.
    This allows users to configure the crush location for each
    cluster separately. The read affinity options specified in
    the ConfigMap will supersede those provided via command
    line arguments.

  • util: moved GetNodeLabels() under internal/util/k8s

  • util: added read affinity related functions and unit testcases

  • util: added RunsOnKubernetes() function

Fixes: #4161


Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

log.FatalLogMsg(err.Error())
}

crushLocationMap, err := util.GetCrushLocationMap(crushLocationLabels, ns.Driver.NodeID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function call k8s api, which is an expensive operation, to get node labels.
Performance will severely be impacted if we perform such a call on each nodeStage request.

Please fetch the node labels when the driver starts and store it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fetch the node labels when the driver starts and store it.

  • Is this a problem for a running Ceph-CSI setup when nodes are added later on?
  • What happens if the Crush location map is modified?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fetch the node labels when the driver starts and store it.

  • Is this a problem for a running Ceph-CSI setup when nodes are added later on?

No, We just use this in daemonset nodeplugin pod.
So, when a new node is added, a daemonset pod starts on it and we'll pick up the labels that time.
There is a requirement currently that when node labels are updated, they'll need to restart nodepugin pods andthe mounted volumes for the changed topology to take affect.

  • What happens if the Crush location map is modified?

If you meant Crush Location map modified by means of node topology label update, it is the same as above.
If you meant changes were done to the configmap to update labels, they'll need to restart the nodeplugin pod too.

I expect these changes to be very rare and possibly untouched after initial creation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea how often locations/labels are added to a crush map 🤷‍♂️ If it is clearly documented that changing labels of a node requires restarting, then thats fine.

@iPraveenParihar iPraveenParihar force-pushed the enhancement/read-affinity-options-per-cluster branch from 3cc1578 to 377ff13 Compare October 10, 2023 07:11
@iPraveenParihar
Copy link
Contributor Author

/test ci/centos/mini-e2e/k8s-1.27

@iPraveenParihar iPraveenParihar force-pushed the enhancement/read-affinity-options-per-cluster branch 3 times, most recently from aa9e7d3 to 8b601e7 Compare October 10, 2023 09:55
@iPraveenParihar iPraveenParihar marked this pull request as ready for review October 10, 2023 10:11
@iPraveenParihar
Copy link
Contributor Author

iPraveenParihar commented Oct 10, 2023

@Rakshith-R @Madhu-1
PR is ready for a review

@@ -45,8 +45,9 @@ type NodeServer struct {
// A map storing all volumes with ongoing operations so that additional operations
// for that same volume (as defined by VolumeID) return an Aborted error
VolumeLocks *util.VolumeLocks
// readAffinityMapOptions contains map options to enable read affinity.
readAffinityMapOptions string
NodeLabels map[string]string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please document this new map with a comment

@@ -280,14 +283,14 @@ func (ns *NodeServer) populateRbdVol(

// appendReadAffinityMapOptions appends readAffinityMapOptions to mapOptions
// if mounter is rbdDefaultMounter and readAffinityMapOptions is not empty.
func (ns NodeServer) appendReadAffinityMapOptions(rv *rbdVolume) {
func appendReadAffinityMapOptions(rv *rbdVolume, readAffinityMapOptions string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can now be a function of the rbdVolume struct:

func (rv *rbdVolume) appendReadAffinityMapOptions(readAffinityMapOptions string)

which would be cleaner

@@ -26,16 +26,11 @@ import (
// the crush location labels and their values from the CO system.
// Expects crushLocationLabels in arg to be in the format "[prefix/]<name>,[prefix/]<name>,...",.
// Returns map of crush location types with its array of associated values.
func GetCrushLocationMap(crushLocationLabels, nodeName string) (map[string]string, error) {
func GetCrushLocationMap(crushLocationLabels string, nodeLabels map[string]string) (map[string]string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment is not correct anymore. The nodeLabels are not retrieved from the CO system within this function.

@@ -34,7 +34,7 @@ const (
labelSeparator string = ","
)

func k8sGetNodeLabels(nodeName string) (map[string]string, error) {
func K8sGetNodeLabels(nodeName string) (map[string]string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can move this function to a file under internal/util/k8s/?

@@ -1396,9 +1399,46 @@ func getDeviceSize(ctx context.Context, devicePath string) (uint64, error) {
return size, nil
}

func (ns *NodeServer) SetReadAffinityMapOptions(crushLocationMap map[string]string) {
func (ns *NodeServer) SetcmdReadAffinityMapOptions(crushLocationMap map[string]string) {
ns.cmdReadAffinityMapOptions = constructReadAffinityMapOption(crushLocationMap)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason to keep constructReadAffinityMapOption() as a separate function? Why not make it

func (ns *NodeServer) constructReadAffinityMapOption(crushLocationMap map[string]string)

or something similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly, we can make cmdReadAffinityMapOptions a public field. This would allow us to directly use it as follows:

ns.cmdReadAffinityMapOptions = ns.constructReadAffinityMapOption(crushLocationMap)

This change would eliminate the need for the SetcmdReadAffinityMapOptions() method.

@iPraveenParihar iPraveenParihar force-pushed the enhancement/read-affinity-options-per-cluster branch 2 times, most recently from 3e7df1b to 52b3723 Compare October 16, 2023 08:06
@iPraveenParihar
Copy link
Contributor Author

/test ci/centos/mini-e2e/k8s-1.27

Comment on lines 129 to 134
nodeLabels, err = k8s.GetNodeLabels(conf.NodeID)
if err != nil {
log.FatalLogMsg(err.Error())
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code will cause problems for non-kubernetes CO. do this check only if the labels are passed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will move this under

if conf.EnableReadAffinity {
  ...
}

and will make --enable-ready-affinity command line flag to be mandatory for enabling read affinity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was thinking do we really need to have this flag, if the labels are set assume that you will enable read-affinity like we do currently?

Copy link
Contributor Author

@iPraveenParihar iPraveenParihar Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code will cause problems for non-kubernetes CO. do this check only if the labels are passed?

@Madhu-1 we can have a new --container-orchestrator flag (default - kubernetes). So we will be fetching node labels only when the flag is set to kubernetes?

Or this way

and will make --enable-ready-affinity command line flag to be mandatory for enabling read affinity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would vote for new --container-orchestrator which might be useful to decide when to make kube API calls. @nixpanic @Rakshith-R thought?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was thinking do we really need to have this flag, if the labels are set assume that you will enable read-affinity like we do currently?

labels maybe set in the configmap instead.

Let's have the read-affinity flag set in cmdline as a pre-requisite.
This setting can be overridden by individual cluster id in configmap.

i would vote for new --container-orchestrator which might be useful to decide when to make kube API calls. @nixpanic @Rakshith-R thought?

I think this entails changes at a lot of places and is out of scope for this pr.

and will make --enable-ready-affinity command line flag to be mandatory for enabling read affinity.

This option seems sufficient for me.

func (ns *NodeServer) getReadAffinityMapOptionFromConfigMap(clusterID string) string {
crushLocationLabels, err := util.GetReadAffinityOptions(util.CsiConfigFile, clusterID)
if err != nil {
log.FatalLogMsg(err.Error())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should not be Fatal you need to return the error message from the function for all cases

internal/util/k8s/node.go Show resolved Hide resolved
internal/util/k8s/node.go Show resolved Hide resolved
@iPraveenParihar iPraveenParihar force-pushed the enhancement/read-affinity-options-per-cluster branch 4 times, most recently from 9c2ed9c to 416e1a0 Compare October 20, 2023 11:28
@@ -87,8 +87,7 @@ spec:
{{- if .Values.nodeplugin.profiling.enabled }}
- "--enableprofiling={{ .Values.nodeplugin.profiling.enabled }}"
{{- end }}
- "--enable-read-affinity={{ and .Values.readAffinity .Values.readAffinity.enabled }}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any specific reason to remove enable-read-affinity flag?

@@ -282,7 +287,6 @@ topology:
# readAffinity:
# Enable read affinity for RBD volumes. Recommended to
# set to true if running kernel 5.8 or newer.
# enabled: false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets not remove it, it can cause breaking change for existing clusters

cmd/cephcsi.go Outdated
flag.StringVar(
&conf.CrushLocationLabels,
"crush-location-labels",
"",
"list of Kubernetes node labels, that determines the"+
" CRUSH location the node belongs to, separated by ','")
flag.StringVar(&conf.ContainerOrchestrator, "container-orchestrator", "kubernetes", "container orchestrator")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my suggestion was to add this extra flag but not to remove the EnableReadAffinity existing flag :)

@@ -224,6 +223,12 @@ If enabled, this option will be added to all RBD volumes mapped by Ceph CSI.
Well known labels can be found
[here](https://kubernetes.io/docs/reference/labels-annotations-taints/).

Read affinity can be configured for individual clusters within the
`ceph-csi-config` ConfigMap. This allows configuring the crush location labels
for each cluster separately. The crush location labels specified in the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for each cluster separately to for each ceph cluster separately

@@ -125,8 +126,13 @@ func (r *Driver) Run(conf *util.Config) {
})
}

if conf.EnableReadAffinity {
crushLocationMap, err = util.GetCrushLocationMap(conf.CrushLocationLabels, conf.NodeID)
nodeLabels, err = k8s.GetNodeLabels(conf.NodeID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this function only if ContainerOrchestrator is kubernetes

// ConstructReadAffinityMapOption constructs a read affinity map option based on the provided crushLocationMap.
// It appends crush location labels in the format
// "read_from_replica=localize,crush_location=label1:value1|label2:value2|...".
func ConstructReadAffinityMapOption(crushLocationMap map[string]string) string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this function can be reused move this to util as we need this for cephfs as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I thought to move this to util during CephFS work.

// getReadAffinityMapOptions retrieves the readAffinityMapOptions from the CSI config file if it exists.
// If not, it falls back to returning the `CLIReadAffinityMapOptions` from the command line.
// If neither of these options is available, it returns an empty string.
func (ns *NodeServer) getReadAffinityMapOptions(clusterID string) (string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to until as this can be reused for cephfs asl well?

@@ -0,0 +1,39 @@
/*
Copyright 2020 The CephCSI Authors.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2023?

@iPraveenParihar iPraveenParihar force-pushed the enhancement/read-affinity-options-per-cluster branch 3 times, most recently from c9b845d to ad6465d Compare October 25, 2023 07:48
Copy link
Collaborator

@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible add a test case where crush rules are set in configmap as well.

cmd/cephcsi.go Outdated Show resolved Hide resolved
deploy/csi-config-map-sample.yaml Show resolved Hide resolved
docs/deploy-rbd.md Show resolved Hide resolved
@@ -215,7 +214,7 @@ for more details.

This can be enabled by adding labels to Kubernetes nodes like
`"topology.io/region=east"` and `"topology.io/zone=east-zone1"` and
passing command line arguments `"--enable-read-affinity=true"` and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to revert this back?

@@ -46,5 +52,9 @@ func NewK8sClient() (*kubernetes.Clientset, error) {
return nil, fmt.Errorf("failed to create client: %w", err)
}

if kubeclient == nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is not required here?

internal/util/read_affinity.go Show resolved Hide resolved

crushLocationMap, err := GetCrushLocationMap(configCrushLocationLabels, nodeLabels)
if err != nil {
log.FatalLogMsg(err.Error())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log as error and a return err is missing here?

@iPraveenParihar iPraveenParihar force-pushed the enhancement/read-affinity-options-per-cluster branch 4 times, most recently from 04ff893 to dd9d0eb Compare October 26, 2023 09:59
@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.27

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.28

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.27

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.28

@ceph-csi-bot ceph-csi-bot removed the ok-to-test Label to trigger E2E tests label Nov 8, 2023
@nixpanic
Copy link
Member

nixpanic commented Nov 8, 2023

@Mergifyio rebase

This commit adds util functions related to read affinity
and unit testcases for the same.

Signed-off-by: Praveen M <[email protected]>
Implemented the capability to include read affinity options
for individual clusters within the ceph-csi-config ConfigMap.
This allows users to configure the crush location for each
cluster separately. The read affinity options specified in
the ConfigMap will supersede those provided via command line arguments.

Signed-off-by: Praveen M <[email protected]>
Copy link
Contributor

mergify bot commented Nov 8, 2023

rebase

✅ Branch has been successfully rebased

@nixpanic nixpanic force-pushed the enhancement/read-affinity-options-per-cluster branch from 6c38ced to 32bf571 Compare November 8, 2023 19:31
@nixpanic
Copy link
Member

nixpanic commented Nov 8, 2023

@Mergifyio queue

@nixpanic nixpanic added the ci/retry/e2e Label to retry e2e retesting on approved PR's label Nov 8, 2023
Copy link
Contributor

mergify bot commented Nov 8, 2023

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at c4e373c

@mergify mergify bot added the ok-to-test Label to trigger E2E tests label Nov 8, 2023
@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.27

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-cephfs

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.27

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-rbd

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.27

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.26

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.28

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.26

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.28

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.26

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.28

@ceph-csi-bot ceph-csi-bot removed the ok-to-test Label to trigger E2E tests label Nov 8, 2023
@mergify mergify bot merged commit c4e373c into ceph:devel Nov 8, 2023
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/retry/e2e Label to retry e2e retesting on approved PR's component/rbd Issues related to RBD component/util Utility functions shared between CephFS and RBD enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

common: add support for read affinity options per cluster-id from configmap
6 participants