You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'd like to put the following for discussion (as it bugs me since I got to know the tools):
The file management and directory management should be simplified as tons of files are copied back and forth which makes (for those who didn't code the core parts (compute.py, jobclass.py, ...) very tricky to track down errors. In addition copying large (restart, forcing) files several times may significantly slow down job throughput. Having worked with the MPI-ESM runtime manager mkexp (python with Jinja2 style .config files) I find their file and directory management simpler and more efficient (while other things are horrible in mkexp); so here comes my suggestion:
Upon start, esm_runscripts creates the the directory structure expid/restart/, expid/outdata/, expid/forcing ... like it is done at the moment.
Copy/Link required forcing files for the current run into expid/forcing. On cold start optionally create a copy of esm_tools there as well.
create a work folder expid/work/run_XXXX-YYYY/.
copy/link all files (forcing, restart, namelists) required for the current run into expid/work/run_XXXX-YYYY/.
cd expid/work/run_XXXX-YYYY/, sbatch .....
Once done copy only the restart files into expid/restart/.
trigger a subjob (like the post jobs at the moment) the does the cleanup (i.e. copying outdata, logs etc in place) of expid/work/run_XXXX-YYYY/ following the bullet-proof method used in mkexp (details later)
increment date and go to 2.) and continue until run is done.
And last: Have all logs (model logs, esm_runscript logs, filelist, *finished.yaml, ...) in one place.
I know this against the current philosophy that everything related to the current run shall be in expid/run_XXXX-YYYY/ but it would certainly simplify the complete config dict and hence make error tracking easier.
The text was updated successfully, but these errors were encountered:
Going through some old issues to start cleaning up before the next release: is this still relevant? If not, @seb-wahl, please close or alternatively please respecify the problem that is happening so we can make a plan.
I'd like to put the following for discussion (as it bugs me since I got to know the tools):
The file management and directory management should be simplified as tons of files are copied back and forth which makes (for those who didn't code the core parts (
compute.py
,jobclass.py
, ...) very tricky to track down errors. In addition copying large (restart, forcing) files several times may significantly slow down job throughput. Having worked with the MPI-ESM runtime managermkexp
(python with Jinja2 style.config
files) I find their file and directory management simpler and more efficient (while other things are horrible inmkexp
); so here comes my suggestion:esm_runscripts
creates the the directory structureexpid/restart/
,expid/outdata/
,expid/forcing
... like it is done at the moment.expid/forcing
. On cold start optionally create a copy ofesm_tools
there as well.expid/work/run_XXXX-YYYY/
.expid/work/run_XXXX-YYYY/
.expid/work/run_XXXX-YYYY/
,sbatch ....
.expid/restart/
.expid/work/run_XXXX-YYYY/
following the bullet-proof method used in mkexp (details later)And last: Have all logs (model logs, esm_runscript logs, filelist, *finished.yaml, ...) in one place.
I know this against the current philosophy that everything related to the current run shall be in
expid/run_XXXX-YYYY/
but it would certainly simplify the completeconfig
dict and hence make error tracking easier.The text was updated successfully, but these errors were encountered: