Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve patchComponent && Add patchCluster scripts #12207

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

todor-ivanov
Copy link
Contributor

@todor-ivanov todor-ivanov commented Dec 13, 2024

Fixes #11685

Status

ready

Description

With the current PR we improve the functionality of patchComponent.sh script adding the following functionalities:

  • The ability to find the correct WMCore codebase destination - regardless if running from pypi or from Source
  • The ability to recognize the WMCore version currently deployed at the destination
  • The ability to parse the patch file to find the full list of files prepared for altering
  • The ability to "zero" the codeBase of all the files which are about to be patched to the state as they are at the WMCore version currently deployed at the destination
  • The ability to perform a second patch attempt starting from latest version of the files from the list of files to be altered (by fetching them from upstream/master) if the first attempt to patch from the currently deployed version fails. It is a feature quite needed in the case of completely broken history and/or intermediate developments that might have happened between the moment of forking to a new branch for creating the patch and the actually applying and/or when the origin branch on top of which the patch has been created is behind the branch currently deployed at destination.
  • Automatically detecting any code conflicts (failed patch steps) and reverting the whole code to the baseline it started from if found any
  • The ability to patch from local tree (through git * commands) without going through officially creating a PR in the upstream repository
  • The ability to chose if to skip any of the above actions
  • The ability to manually revert (only zeroing) the whole code base to the version deployed at the destination
  • The ability to apply multiple patches in a row (preserving their order)

We also introduce the script patchClustetr.sh, which is capable of applying all those step on a Kuberenetes cluster. The granularity could be:

  • A whole cluster
  • A Pod
  • A Service
  • A Deployment

This way those two scripts become a well coupled pair, which boosts the development and testing/validation process tremendously!!!

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

None

External dependencies / deployment changes

None

Copy link
Contributor

@vkuznet vkuznet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to those I provided through the code I don't know how both scripts handle different error situations, e.g. if patch is already applied, if it is partially succeed? Can it be reversed? It would be useful if you'll clarify these questions as well.

# NOTE: We do not support patching from file and patching from command line simultaneously
# If both provided at the command line patching from command line takes precedence

podCmdActions="wget https://raw.githubusercontent.com/todor-ivanov/WMCoreAux/master/bin/patchComponent.sh -O /data/patchComponent.sh && chmod 755 /data/patchComponent.sh "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few things here:

  • you are assuming that wget is installed on a node which may not be the case. Please add appropriate test and failure for wget (and/or curl) commands
  • I think you should use back quote to get output of command to assign it to a shell variable. please check
  • I also suggest to separate wget and chmod command to properly capture wget output

Copy link
Contributor Author

@todor-ivanov todor-ivanov Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, wget or curl are requirements. I can try to double check if they are present at the container, but this would make the podCmd* commands executions more complicate.... - would require two extra steps from the caller script, so at least one more cycle there.... The idea here is that they should go in as "one-liners" because they would be pushed through the kubectl connection session as a single command with accumulated errors. So the only way to achieve that is through an && logical chain. I'll see what I can do, though, and will write back the results from my tests here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so:

I think you should use back quote to get output of command to assign it to a shell variable. please check

Actually not. Because we need to delay the execution as much as possible so it happens at the container. A back quote here would push the execution to happen at the caller's shell. So we better first construct the strings and then send it through the session to the containers.

echo "DEBUG: newPipeFlag=$newPipeFlag"
}

podCmd="$podCmdActions && sudo /data/patchComponent.sh $podCmdOpts $patchList "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same comment here, you are implying that sudo exists which may not be the case.

}

podCmd="$podCmdActions && sudo /data/patchComponent.sh $podCmdOpts $patchList "
restartCmd="/data/manage restart && sleep 1 && /data/manage status"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here you assume that /data/manage exist, and it is related to our current way of installing which may change in a future. Maybe you should define manage somehow, or at least test that it exists

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is all true.... I will try to include this one in the above mentioned tests as well.

echo --------------------------------------------------------
# First try to find any pod from the service name provided and then extend the list in currPods:
if [[ -n $currService ]]; then
servicePods=`kubectl -n $nameSpace get ep $currService -o=jsonpath='{.subsets[*].addresses[*].ip}' | tr ' ' '\n' | xargs -I % kubectl -n $nameSpace get pods -o=name --field-selector=status.podIP=%`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please test that kubectl exists in a user PATH.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same answer... Will try it in my next commit

bin/patchComponent.sh Show resolved Hide resolved
@todor-ivanov todor-ivanov force-pushed the feature_Improve_patchScripts_fix-11685 branch 2 times, most recently from 6b22f49 to 3a97843 Compare December 20, 2024 08:57
@todor-ivanov
Copy link
Contributor Author

Hi @vkuznet I have addressed your comments. Please take another look.

@amaltaro
Copy link
Contributor

@todor-ivanov if these scripts are not yet in our documentation, then please do so and link the relevant wmcore-docs MR in the initial PR description.
In addition, it has code conflicts that need to be resolved.

Fix broken testFileList creation

Use only curl && Add extra check for tool existence at the pod.
@todor-ivanov todor-ivanov force-pushed the feature_Improve_patchScripts_fix-11685 branch from 3a97843 to ceb2fe1 Compare December 20, 2024 17:30
@dmwm-bot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/232/artifact/artifacts/PullRequestReport.html

@dmwm-bot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/233/artifact/artifacts/PullRequestReport.html

@dmwm-bot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/234/artifact/artifacts/PullRequestReport.html

@dmwm-bot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 3 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/235/artifact/artifacts/PullRequestReport.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve patchComponent script
4 participants