-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fetch job and update stage_ic to work with fetched ICs #3141
base: develop
Are you sure you want to change the base?
Conversation
I am in the process of testing. |
To test my code, I ran create_experiment with the short yaml C48_ATM.yaml, (which created /scratch1/NCEPDEV/global/David.Grumm/G_WF_2988/testroot_1/EXPDIR and COMROOT) by : HPC_ACCOUNT="fv3-cpu" MY_TESTROOT="/scratch1/NCEPDEV/global/David.Grumm/G_WF_2988/testroot_1" RUNTESTS=${MY_TESTROOT} pslot="1306a_2988" ./create_experiment.py --yaml ../ci/cases/pr/C48_ATM.yaml … which completed without error or warning messages. From within that EXPDIR I ran rocotorun: … which completed without error or warning messages. There was also no output to stdout, which I did not expect as I had placed a few diagnostic prints in my code. I verified that I am my current branch. Runniing rocotostat gives me: CYCLE TASK JOBID STATE EXIT STATUS TRIES DURATION ======================================================================== I have 2 questions:
|
Rocoto is not a fully automated system. For each invocation of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some whitespace cleanup (most of it mine).
I updated the crontab. |
Removed extraneous white space from fetch.py and recommitted; still testing. |
…com:DavidGrumm-NOAA/Global_Workflow_2988 into stage_ic_2988
I moved the fetch options to be in the run_options dict. |
ci/cases/pr/ATM_cold.yaml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be in parm/fetch/ATM_cold.yaml.j2
EDIT: added .j2
as this is Jinja
-templated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the comment below, removing parm/fetch/ATM_cold.yaml.j2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion, @DavidGrumm-NOAA. I was suggesting that this file should be moved to parm/fetch/ATM_cold.yaml.j2
. This is the template for the fetch job, not for a CI test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making a copy to parm/fetch/ATM_cold.yaml.j2. This file can be deleted now with git rm ci/cases/pr/ATM_cold.yaml && git commit && git push
.
…stage_ic and yaml code
…, replaced gefs/config.fetch with a link to gfs/config.fetch
scripts/exglobal_fetch.py
Outdated
fetch = Fetch(config) | ||
|
||
# Pull out all the configuration keys needed to run the fetch step | ||
keys = ['current_cycle', 'RUN', 'PDY', 'PARMgfs', 'PSLOT', 'ROTDIR', 'fetch_yaml', 'FETCHDIR', 'ntiles', 'DATAROOT', 'cycle_YMDH'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, cycle_YMDH
needs to be set in the jinja2 file.
keys = ['current_cycle', 'RUN', 'PDY', 'PARMgfs', 'PSLOT', 'ROTDIR', 'fetch_yaml', 'FETCHDIR', 'ntiles', 'DATAROOT', 'cycle_YMDH'] | |
keys = ['current_cycle', 'RUN', 'PDY', 'PARMgfs', 'PSLOT', 'ROTDIR', 'fetch_yaml', 'FETCHDIR', 'ntiles', 'DATAROOT'] |
Those changes fixed that ‘None’ error in the path, so it is now: /NCEPDEV/emc-global/1year/David.Grumm/test_data/2021032312/atm_cold.tar … but that directory ( /NCEPDEV/emc-global/1year/David.Grumm/test_data/2021032312/) does not seem to exist, so it still fails. From the log: < snip >
Start Epilog on node hfe01 for job 4228191 :: Mon Dec 23 20:30:23 UTC 2024 End Epilogue Mon Dec 23 20:30:23 UTC 2024 I will look into the intended creation of this directory. |
The current file creation is correct - the error (“no such HPSS archive file”) is resolved by locating the tarball in the correct directory. There currently is a warning for the chdir call for htar/tar which I’m investigating. |
scripts/exglobal_fetch.py
Outdated
fetch = Fetch(config) | ||
|
||
# Pull out all the configuration keys needed to run the fetch step | ||
keys = ['current_cycle', 'RUN', 'PDY', 'PARMgfs', 'PSLOT', 'ROTDIR', 'fetch_yaml', 'FETCHDIR', 'ntiles', 'DATAROOT'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config.fetch
sets the name of the fetch job Jinja-YAML to FETCH_YAML_TMPL
. This is the environmental variable that exglobal_fetch.py
should be looking for.
keys = ['current_cycle', 'RUN', 'PDY', 'PARMgfs', 'PSLOT', 'ROTDIR', 'fetch_yaml', 'FETCHDIR', 'ntiles', 'DATAROOT'] | |
keys = ['current_cycle', 'RUN', 'PDY', 'PARMgfs', 'PSLOT', 'ROTDIR', 'FETCH_YAML_TMPL', 'FETCHDIR', 'ntiles', 'DATAROOT'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
parm/config/gfs/config.fetch
Outdated
|
||
echo "BEGIN: config.fetch" | ||
|
||
export FETCH_YAML_TMPL="${PARMgfs}/fetch/C48_cold.yaml.j2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the Jinja-YAML is ATM_cold.yaml.j2
.
export FETCH_YAML_TMPL="${PARMgfs}/fetch/C48_cold.yaml.j2" | |
export FETCH_YAML_TMPL="${PARMgfs}/fetch/ATM_cold.yaml.j2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
scripts/exglobal_fetch.py
Outdated
# Also import all COMOUT* directory and template variables | ||
for key in fetch.task_config.keys(): | ||
if key.startswith("COMOUT_"): | ||
fetch_dict[key] = fetch.task_config.get(key) | ||
if fetch_dict[key] is None: | ||
print(f"Warning: key ({key}) not found in task_config!") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it, the fetch
task doesn't interact with the COM
directories, so I think this can be deleted.
# Also import all COMOUT* directory and template variables | |
for key in fetch.task_config.keys(): | |
if key.startswith("COMOUT_"): | |
fetch_dict[key] = fetch.task_config.get(key) | |
if fetch_dict[key] is None: | |
print(f"Warning: key ({key}) not found in task_config!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
workflow/rocoto/gfs_tasks.py
Outdated
if self.options['do_fetch_hpss'] or self.options['do_fetch_local']: | ||
deps = [] | ||
dep_dict = { | ||
'type': 'task', 'name': f'fetch', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The task name should also have the RUN
in it:
'type': 'task', 'name': f'fetch', | |
'type': 'task', 'name': f'{self.run}_fetch', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
workflow/rocoto/tasks.py
Outdated
@@ -10,7 +10,7 @@ | |||
|
|||
|
|||
class Tasks: | |||
SERVICE_TASKS = ['arch', 'earc', 'stage_ic', 'cleanup'] | |||
SERVICE_TASKS = ['arch', 'earc', 'stage_ic', 'fetch', 'cleanup'] | |||
VALID_TASKS = ['aerosol_init', 'stage_ic', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fetch
should also be added to the VALID_TASKS
list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
ush/python/pygfs/task/fetch.py
Outdated
""" | ||
self.hsi = Hsi() | ||
|
||
fetch_yaml = fetch_dict.fetch_yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be FETCH_YAML_TMPL
:
fetch_yaml = fetch_dict.fetch_yaml | |
fetch_yaml = fetch_dict.FETCH_YAML_TMPL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Description
Most jobs require the initial conditions to be available on local disk. The existing “stage_ic” task copies/stages these initial condition into the experiment's COM directory. This PR for the “fetch” task extends that functionality to copy from HPSS (on HPSS-accessible machines) into COM.
This PR resolves issue “Stage initial conditions stored on HPSS (#2988)”.
This is currently a DRAFT PR.
Type of change
Change characteristics
How has this been tested?
Checklist