Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulletproof Runs #423

Closed
pgierz opened this issue Aug 5, 2021 · 14 comments
Closed

Bulletproof Runs #423

pgierz opened this issue Aug 5, 2021 · 14 comments

Comments

@pgierz
Copy link
Member

pgierz commented Aug 5, 2021

Hello all,

@chrisdane has been helping me out testing the "output from echam.namelist" feature, which will (once ready) allow you to figure out which files to move around based upon both the streams defined in your YAML and on top of that whatever the echam namelist mvstreams sets up for you (For details, see the parts of Issue #384)

In the process of testing, he accidentally managed to break one of his production runs, because the virtual environment was turned off. Therefore, we came up with the following idea:

  • Virtual environments should be on by default
  • They should install whatever you currently have as default unless you directly specify something else

Turning on the virtual env by default is simple, I will send a PR for that in a moment. Less simple is figuring out all the branches you currently have. That would be a job for esm-version checker. @denizural , I'll open a separate issue in that repository to discuss more, but it would be nice to have a feature where we have a function, and it just spits back a dictionary of:

current_branches: {
    "esm_tools": "140d197"
}

Effectively, one for git-sha for each project. But, as I said, I'll make a separate issue to discuss that part.

@dbarbi
Copy link
Member

dbarbi commented Aug 5, 2021

objection... we decided to not have any default for venv because either default will mess with people. i would rather not do that...

@chrisdane
Copy link
Contributor

I dont unterstand you Dirk. What Paul suggests was the default in the old esm tools, right? Everything was copied to the experiment directory and then used from there forever. In my view that made sense and as a user I would like to have that as well in the new esm tools.

@pgierz
Copy link
Member Author

pgierz commented Aug 5, 2021

@cdanek, to clarify for you. In principle, if you want to make sure nothing ever changes, always make a virtual environment.

What we currently have implemented is:

Runscript Option Command Line Option Already using a venv at submission time? Tool Location Used Tool Version Used
unset unset No ~/.local/lib/python-<version>/site-packages/esm-tools whatever your branch is
unset unset Yes <venv_base>/lib/python-<version>/site-packages/esm-tools whatever your branch is
use_venv: True unset No <experiment_path>/.venv/site-packages/esm_tools release
unset --open-run No ~/.local/lib/python-<version>/site-packages/esm-tools whatever your branch is
unset --open-run Yes <venv_base>/lib/python-<version>/site-packages/esm-tools whatever your branch is
unset --contained-run Yes <experiment_path>/.venv/lib/python-<version>/site-packages/esm-tools release
use_venv: True and install_esm_tools_branch: develop unset No <experiment_path>/.venv/site-packages/esm_tools develop
use_venv: False unset No <experiment_path>/.venv/site-packages/esm_tools release

I'm probably missing a few cases, but I think that outlines the general idea.

@dbarbi
Copy link
Member

dbarbi commented Aug 5, 2021

No, in the old esm-tools we didn't have any virtual environment. i agree with you it makes a lot of sense to use venv for runs, but as we have also developers to think about, they don't want to use it. that's why we have the interactive question if you don't specify anything - to make you aware that venv exists, and to think about whether you want to use it

@pgierz
Copy link
Member Author

pgierz commented Aug 5, 2021

@pgierz
Copy link
Member Author

pgierz commented Aug 5, 2021

Chris, how did you launch the run that got messed up? Did it have anything set, or did you use any command flags?

I could check if anything is set, and then be pedantic and ask the user even one more time, just to be sure. That would quickly get annoying though. Or @dbarbi we make a secret flag "developer mode" or something that skips the second question if something is already set.

@pgierz
Copy link
Member Author

pgierz commented Aug 5, 2021

Something not on my table but probably should be, if both the runscript and the command line are set, the unscript wins. Maybe I should swap that around...

@chrisdane
Copy link
Contributor

chrisdane commented Aug 5, 2021

i agree with you it makes a lot of sense to use venv for runs, but as we have also developers to think about, they don't want to use it.

What is the reason for the developers not to to so? It takes too long to copy everything? The cp process could run in the backgroud? And if the model needs to be resubmitted before the venv copy is ready (the only problematic case?), the esm_tools could wait until the copy process is finished (btw it takes ~6 minutes on mistral, not ~3).

Chris, how did you launch the run that got messed up? Did it have anything set, or did you use any command flags?

The problem with this interactive part is that if I submit a runscript that not explicity sets use_venv, the call

esm_runscripts runscript.yaml -e test > test.log 2>&1 &

does not work since the piping of the output does not allow the interactive part (or I was too stupid). As a consequence I have set venv=false for testing and then for production I forgot to set it back to true. So, my failure of course. Is there a way to help me not to make such errors? :D

Pauls table looks rather complicated to me. Why do you need both venv and open-run? I think I could better understand if there is only one of these.

One more thing that I dont understand: venv uses the online repo. That means that I need to push the current version of e.g. the esm_tools I want to use for production run?

@pgierz
Copy link
Member Author

pgierz commented Aug 5, 2021

What is the reason for the developers not to to so?

This is a good question. Actually, in the Python world, you should be basically always using a separate environment for each project to avoid version conflicts.

Is there a way to help me not to make such errors? :D

We could require two separate people to double check each runscript used for production. Maybe even with a GPG signature. But this is way waaaaay overkill. I'm sure there is even a python library to check against GPG, but that's one of those things I'd have fun hacking in, but then no one would ever use.

One more thing that I dont understand: venv uses the online repo. That means that I need to push the current version of e.g. the esm_tools I want to use for production run?

Yes, you do. It does a new clone and install over pip. But the idea of the venv thing is that people should be using the last stable version for their production simulations, which for us is always whatever is in the release branch. There are ways to use special branches in the virtualenv, see my table above, but by default you get release. If you need to use a separate special branch for your simulation, what you are doing is in that case not officially supported (at least not by the esm-team and my own little volunteer time).

@chrisdane
Copy link
Contributor

If you need to use a separate special branch for your simulation, what you are doing is in that case not officially supported

In a perfect world I would use the last stable version of the esm_tools. But it never reached a stable state for my simulations: I never made a single run with release (please make a survey and realize that I am not the only one).

So the current venv implementation forces me to push a new version/branch of the esm tools even with only one character changed compared to release. This strategy spams the whole repo, in my view.

@pgierz
Copy link
Member Author

pgierz commented Aug 5, 2021

This strategy spams the whole repo, in my view.

Isn't any typo you fix good for the overall project?

@dbarbi
Copy link
Member

dbarbi commented Aug 5, 2021

I can't give that any priority.

@pgierz
Copy link
Member Author

pgierz commented Aug 24, 2021

So, what's the status here? Do we need to include a section in the handbook to clarify things?

@pgierz
Copy link
Member Author

pgierz commented Aug 24, 2021

Re-open if needed, please.

@pgierz pgierz closed this as completed Aug 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants