Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archive the experiment directory along with git status/diff output #3105

Merged
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
e7e43d2
First crack at archiving the expdir and git status
DavidHuber-NOAA Nov 15, 2024
9f1a6fc
Merge remote-tracking branch 'origin/develop' into feature/archive_ex…
DavidHuber-NOAA Nov 15, 2024
2746e46
Add flake8 rules for the global workflow
DavidHuber-NOAA Nov 18, 2024
1579844
Copy expdir to ROTDIR before tar'ing
DavidHuber-NOAA Nov 18, 2024
95f04d4
Merge branch 'develop' into feature/archive_expdir
DavidHuber-NOAA Nov 18, 2024
23b0454
Allow users to specify their HPC account
DavidHuber-NOAA Nov 19, 2024
f88c077
Start archiving EXPDIR on the first full cycle and name expdir direct…
DavidHuber-NOAA Nov 19, 2024
6924428
Merge branch 'feature/archive_expdir' of github.com:davidhuber-noaa/g…
DavidHuber-NOAA Nov 19, 2024
d2db267
Merge remote-tracking branch 'origin/develop' into feature/archive_ex…
DavidHuber-NOAA Nov 19, 2024
7d0b70a
Trimmed .flake8 exclusions
DavidHuber-NOAA Nov 19, 2024
5f67730
Merge remote-tracking branch 'origin/develop' into feature/archive_ex…
DavidHuber-NOAA Nov 20, 2024
a61055e
Merge branch 'feature/archive_expdir' of github.com:davidhuber-noaa/g…
DavidHuber-NOAA Nov 20, 2024
f385f0c
Merge branch 'develop' into feature/archive_expdir
DavidHuber-NOAA Nov 25, 2024
1b4340e
Merge remote-tracking branch 'emc/develop' into feature/archive_expdir
DavidHuber-NOAA Nov 26, 2024
7ca219c
Merge branch 'NOAA-EMC:develop' into feature/archive_expdir
DavidHuber-NOAA Dec 2, 2024
12b6f95
Remove extra A option
DavidHuber-NOAA Dec 2, 2024
b8cc2ef
Add documentation on EXPDIR archiving
DavidHuber-NOAA Dec 2, 2024
5014922
Merge remote-tracking branch 'origin/develop' into feature/archive_ex…
DavidHuber-NOAA Dec 3, 2024
7ab8dc1
Apply suggestions from code review
DavidHuber-NOAA Dec 3, 2024
0dd3eb8
Apply suggestions from code review
DavidHuber-NOAA Dec 3, 2024
6d89b5c
Merge branch 'develop' into feature/archive_expdir
DavidHuber-NOAA Dec 4, 2024
28bb727
Add self to instance methods
DavidHuber-NOAA Dec 4, 2024
729be32
Merge branch 'feature/archive_expdir' of github.com:davidhuber-noaa/g…
DavidHuber-NOAA Dec 4, 2024
979f2cb
Merge branch 'develop' into feature/archive_expdir
DavidHuber-NOAA Dec 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[flake8]
exclude = .git,.github,venv,__pycache__,old,build,dist
max-line-length = 160
13 changes: 8 additions & 5 deletions docs/source/configure.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,12 +48,15 @@ The global-workflow configs contain switches that change how the system runs. Ma
| | (.true.) or cold (.false)? | | | be set when running ``setup_expt.py`` script with |
| | | | | the ``--start`` flag (e.g. ``--start warm``) |
+------------------+----------------------------------+---------------+-------------+---------------------------------------------------+
| HPSSARCH | Archive to HPPS | NO | Possibly | Whether to save output to tarballs on HPPS |
| HPSSARCH | Archive to HPPS | NO | NO | Whether to save output to tarballs on HPPS. |
+------------------+----------------------------------+---------------+-------------+---------------------------------------------------+
| LOCALARCH | Archive to a local directory | NO | Possibly | Instead of archiving data to HPSS, archive to a |
| | | | | local directory, specified by ATARDIR. If |
| | | | | LOCALARCH=YES, then HPSSARCH must =NO. Changing |
| | | | | HPSSARCH from YES to NO will adjust the XML. |
| LOCALARCH | Archive to a local directory | NO | NO | Whether to save output to tarballs locally. For |
| | | | | HPSSARCH and LOCALARCH, ARCDIR specifies the |
| | | | | directory. These options are mutually exclusive. |
+------------------+----------------------------------+---------------+-------------+---------------------------------------------------+
| ARCH_EXPDIR | Archive the EXPDIR | NO | NO | Whether to create a tarball of the EXPDIR. |
| | | | | ARCH_HASHES and ARCH_DIFFS generate text files |
| | | | | of git output that are archived with the EXPDIR. |
+------------------+----------------------------------+---------------+-------------+---------------------------------------------------+
| QUILTING | Use I/O quilting | .true. | NO | If .true. choose OUTPUT_GRID as cubed_sphere_grid |
| | | | | in netcdf or gaussian_grid |
Expand Down
24 changes: 24 additions & 0 deletions parm/archive/expdir.yaml.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{% set cycle_YMDH = current_cycle | to_YMDH %}

expdir:
name: "EXPDIR"
# Copy the experiment files from the EXPDIR into the ROTDIR for archiving
{% set copy_expdir = "expdir." ~ cycle_YMDH %}
FileHandler:
mkdir:
- "{{ ROTDIR }}/{{ copy_expdir }}"
copy:
{% for config in glob(EXPDIR ~ "/config.*") %}
- [ "{{ config }}", "{{ ROTDIR }}/{{ copy_expdir }}/." ]
{% endfor %}
- [ "{{ EXPDIR }}/{{ PSLOT }}.xml", "{{ ROTDIR }}/{{ copy_expdir }}/." ]
{% if ARCH_HASHES or ARCH_DIFFS %}
- [ "{{ EXPDIR }}/git_info.log", "{{ ROTDIR }}/{{ copy_expdir }}/." ]
{% endif %}
target: "{{ ATARDIR }}/{{ cycle_YMDH }}/expdir.tar"
required:
- "{{ copy_expdir }}/config.*"
- "{{ copy_expdir }}/{{ PSLOT }}.xml"
{% if ARCH_HASHES or ARCH_DIFFS %}
- "{{ copy_expdir }}/git_info.log"
{% endif %}
9 changes: 8 additions & 1 deletion parm/archive/master_gdas.yaml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ datasets:
# Determine if we will save restart ICs or not (only valid for cycled)
{% set save_warm_start_forecast, save_warm_start_cycled = ( False, False ) %}

{% if ARCH_CYC == cycle_HH | int%}
{% if ARCH_CYC == cycle_HH | int %}
# Save the forecast-only cycle ICs every ARCH_WARMICFREQ or ARCH_FCSTICFREQ days
{% if (current_cycle - SDATE).days % ARCH_WARMICFREQ == 0 %}
{% set save_warm_start_forecast = True %}
Expand Down Expand Up @@ -97,3 +97,10 @@ datasets:

# End of restart checking
{% endif %}

# Archive the EXPDIR if requested
{% if archive_expdir %}
{% filter indent(width=4) %}
{% include "expdir.yaml.j2" %}
{% endfilter %}
{% endif %}
Comment on lines +101 to +106
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should only need this in one of gdas or gfs, but there is a bit of a coordination problem since either can be run without the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I added a method to archive.py to determine which RUN to archive this in and set the archive_expdir boolean accordingly.

7 changes: 7 additions & 0 deletions parm/archive/master_gefs.yaml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,10 @@ datasets:
{% include "gefs_extracted_ice.yaml.j2" %}
{% include "gefs_extracted_wave.yaml.j2" %}
{% endfilter %}

# Archive the EXPDIR if requested
{% if archive_expdir %}
{% filter indent(width=4) %}
{% include "expdir.yaml.j2" %}
{% endfilter %}
{% endif %}
7 changes: 7 additions & 0 deletions parm/archive/master_gfs.yaml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -98,3 +98,10 @@ datasets:
{% endfilter %}
{% endif %}
{% endif %}

# Archive the EXPDIR if requested
{% if archive_expdir %}
{% filter indent(width=4) %}
{% include "expdir.yaml.j2" %}
{% endfilter %}
{% endif %}
8 changes: 6 additions & 2 deletions parm/config/gefs/config.base
Original file line number Diff line number Diff line change
Expand Up @@ -326,9 +326,13 @@ if [[ ${HPSSARCH} = "YES" ]] && [[ ${LOCALARCH} = "YES" ]]; then
echo "Both HPSS and local archiving selected. Please choose one or the other."
exit 3
fi
export ARCH_CYC=00 # Archive data at this cycle for warm_start capability
export ARCH_WARMICFREQ=4 # Archive frequency in days for warm_start capability
export ARCH_CYC=00 # Archive data at this cycle for warm start and/or forecast-only capabilities
export ARCH_WARMICFREQ=4 # Archive frequency in days for warm start capability
export ARCH_FCSTICFREQ=1 # Archive frequency in days for gdas and gfs forecast-only capability
export ARCH_EXPDIR='YES' # Archive the EXPDIR configs, XML, and database
export ARCH_EXPDIR_FREQ=0 # How often to archive the EXPDIR in hours or 0 for first and last cycle only
export ARCH_HASHES='YES' # Archive the hashes of the GW and submodules and 'git status' for each; requires ARCH_EXPDIR
export ARCH_DIFFS='NO' # Archive the output of 'git diff' for the GW; requires ARCH_EXPDIR

export DELETE_COM_IN_ARCHIVE_JOB="YES" # NO=retain ROTDIR. YES default in arch.sh and earc.sh.

Expand Down
8 changes: 6 additions & 2 deletions parm/config/gfs/config.base
Original file line number Diff line number Diff line change
Expand Up @@ -476,9 +476,13 @@ if [[ ${HPSSARCH} = "YES" ]] && [[ ${LOCALARCH} = "YES" ]]; then
echo "FATAL ERROR: Both HPSS and local archiving selected. Please choose one or the other."
exit 4
fi
export ARCH_CYC=00 # Archive data at this cycle for warm_start capability
export ARCH_WARMICFREQ=4 # Archive frequency in days for warm_start capability
export ARCH_CYC=00 # Archive data at this cycle for warm start and/or forecast-only capabilities
export ARCH_WARMICFREQ=4 # Archive frequency in days for warm start capability
export ARCH_FCSTICFREQ=1 # Archive frequency in days for gdas and gfs forecast-only capability
export ARCH_EXPDIR='YES' # Archive the EXPDIR configs, XML, and database
export ARCH_EXPDIR_FREQ=0 # How often to archive the EXPDIR in hours or 0 for first and last cycle only
export ARCH_HASHES='YES' # Archive the hashes of the GW and submodules and 'git status' for each; requires ARCH_EXPDIR
export ARCH_DIFFS='NO' # Archive the output of 'git diff' for the GW; requires ARCH_EXPDIR

# The monitor jobs are not yet supported for JEDIATMVAR.
if [[ ${DO_JEDIATMVAR} = "YES" ]]; then
Expand Down
26 changes: 13 additions & 13 deletions scripts/exglobal_archive.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import os

from pygfs.task.archive import Archive
from wxflow import AttrDict, Logger, cast_strdict_as_dtypedict, logit
from wxflow import AttrDict, Logger, cast_strdict_as_dtypedict, logit, chdir

# initialize root logger
logger = Logger(level=os.environ.get("LOGGING_LEVEL", "DEBUG"), colored_log=True)
Expand Down Expand Up @@ -32,7 +32,8 @@ def main():
'DO_AERO_ANL', 'DO_AERO_FCST', 'DOIBP_WAV', 'DO_JEDIOCNVAR',
'NMEM_ENS', 'DO_JEDIATMVAR', 'DO_VRFY_OCEANDA', 'FHMAX_FITS', 'waveGRD',
'IAUFHRS', 'DO_FIT2OBS', 'NET', 'FHOUT_HF_GFS', 'FHMAX_HF_GFS', 'REPLAY_ICS',
'OFFSET_START_HOUR']
'OFFSET_START_HOUR', 'ARCH_EXPDIR', 'EXPDIR', 'ARCH_EXPDIR_FREQ', 'ARCH_HASHES',
'ARCH_DIFFS', 'SDATE', 'EDATE', 'HOMEgfs']

archive_dict = AttrDict()
for key in keys:
Expand All @@ -47,21 +48,20 @@ def main():
if archive_dict[key] is None:
print(f"Warning: key ({key}) not found in task_config!")

cwd = os.getcwd()
with chdir(config.ROTDIR):

os.chdir(config.ROTDIR)
# Determine which archives to create
arcdir_set, atardir_sets = archive.configure(archive_dict)

# Determine which archives to create
arcdir_set, atardir_sets = archive.configure(archive_dict)
# Populate the product archive (ARCDIR)
archive.execute_store_products(arcdir_set)

# Populate the product archive (ARCDIR)
archive.execute_store_products(arcdir_set)
# Create the backup tarballs and store in ATARDIR
for atardir_set in atardir_sets:
archive.execute_backup_dataset(atardir_set)

# Create the backup tarballs and store in ATARDIR
for atardir_set in atardir_sets:
archive.execute_backup_dataset(atardir_set)

os.chdir(cwd)
# Clean up any temporary files
archive.clean()


if __name__ == '__main__':
Expand Down
Loading
Loading