Skip to content

Commit

Permalink
Archive the experiment directory along with git status/diff output (#…
Browse files Browse the repository at this point in the history
…3105)

# Description
This adds the capability to archive the experiment directory.
Additionally, this adds options to run `git status` and `git diff` on
the `HOMEgfs` global workflow (but not the submodules) and store that
information within the experiment directory's archive. These options are
specified in `config.base` with the following defaults:

```bash
export ARCH_EXPDIR='YES'     # Archive the EXPDIR configs, XML, and database
export ARCH_EXPDIR_FREQ=0    # How often to archive the EXPDIR in hours or 0 for first and last cycle only
export ARCH_HASHES='YES'     # Archive the hashes of the GW and submodules and 'git status' for each; requires ARCH_EXPDIR
export ARCH_DIFFS='NO'       # Archive the output of 'git diff' for the GW; requires ARCH_EXPDIR
```

Resolves #2994
# Type of change
- [x] New feature (adds functionality)

# Change characteristics
<!-- Choose YES or NO from each of the following and delete the other
-->
- Is this a breaking change (a change in existing functionality)? NO
- Does this change require a documentation update? YES
- Does this change require an update to any of the following submodules?
YES (If YES, please add a link to any PRs that are pending.)
  - [x] wxflow NOAA-EMC/wxflow#45

# How has this been tested?
- [x] Local archiving on Hercules for a C48_ATM case
- [x] Cycled testing on Hercules with `ARCH_DIFFS=YES` and
`ARCH_EXPDIR_FREQ=6,12`
- [x] Testing with `ARCH_EXPDIR=NO` or `ARCH_HASHES=NO`

# Checklist
- [x] Any dependent changes have been merged and published
- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have documented my code, including function, input, and output
descriptions
- [x] My changes generate no new warnings
- [x] New and existing tests pass with my changes
- [x] This change is covered by an existing CI test or a new one has
been added
- [x] Any new scripts have been added to the .github/CODEOWNERS file
with owners
- [x] I have made corresponding changes to the system documentation if
necessary

---------

Co-authored-by: Walter Kolczynski - NOAA <[email protected]>
  • Loading branch information
DavidHuber-NOAA and WalterKolczynski-NOAA authored Dec 9, 2024
1 parent bc61862 commit 3a8697d
Show file tree
Hide file tree
Showing 11 changed files with 253 additions and 29 deletions.
3 changes: 3 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[flake8]
exclude = .git,.github,venv,__pycache__,old,build,dist
max-line-length = 160
13 changes: 8 additions & 5 deletions docs/source/configure.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,12 +48,15 @@ The global-workflow configs contain switches that change how the system runs. Ma
| | (.true.) or cold (.false)? | | | be set when running ``setup_expt.py`` script with |
| | | | | the ``--start`` flag (e.g. ``--start warm``) |
+------------------+----------------------------------+---------------+-------------+---------------------------------------------------+
| HPSSARCH | Archive to HPPS | NO | Possibly | Whether to save output to tarballs on HPPS |
| HPSSARCH | Archive to HPPS | NO | NO | Whether to save output to tarballs on HPPS. |
+------------------+----------------------------------+---------------+-------------+---------------------------------------------------+
| LOCALARCH | Archive to a local directory | NO | Possibly | Instead of archiving data to HPSS, archive to a |
| | | | | local directory, specified by ATARDIR. If |
| | | | | LOCALARCH=YES, then HPSSARCH must =NO. Changing |
| | | | | HPSSARCH from YES to NO will adjust the XML. |
| LOCALARCH | Archive to a local directory | NO | NO | Whether to save output to tarballs locally. For |
| | | | | HPSSARCH and LOCALARCH, ARCDIR specifies the |
| | | | | directory. These options are mutually exclusive. |
+------------------+----------------------------------+---------------+-------------+---------------------------------------------------+
| ARCH_EXPDIR | Archive the EXPDIR | NO | NO | Whether to create a tarball of the EXPDIR. |
| | | | | ARCH_HASHES and ARCH_DIFFS generate text files |
| | | | | of git output that are archived with the EXPDIR. |
+------------------+----------------------------------+---------------+-------------+---------------------------------------------------+
| QUILTING | Use I/O quilting | .true. | NO | If .true. choose OUTPUT_GRID as cubed_sphere_grid |
| | | | | in netcdf or gaussian_grid |
Expand Down
24 changes: 24 additions & 0 deletions parm/archive/expdir.yaml.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{% set cycle_YMDH = current_cycle | to_YMDH %}

expdir:
name: "EXPDIR"
# Copy the experiment files from the EXPDIR into the ROTDIR for archiving
{% set copy_expdir = "expdir." ~ cycle_YMDH %}
FileHandler:
mkdir:
- "{{ ROTDIR }}/{{ copy_expdir }}"
copy:
{% for config in glob(EXPDIR ~ "/config.*") %}
- [ "{{ config }}", "{{ ROTDIR }}/{{ copy_expdir }}/." ]
{% endfor %}
- [ "{{ EXPDIR }}/{{ PSLOT }}.xml", "{{ ROTDIR }}/{{ copy_expdir }}/." ]
{% if ARCH_HASHES or ARCH_DIFFS %}
- [ "{{ EXPDIR }}/git_info.log", "{{ ROTDIR }}/{{ copy_expdir }}/." ]
{% endif %}
target: "{{ ATARDIR }}/{{ cycle_YMDH }}/expdir.tar"
required:
- "{{ copy_expdir }}/config.*"
- "{{ copy_expdir }}/{{ PSLOT }}.xml"
{% if ARCH_HASHES or ARCH_DIFFS %}
- "{{ copy_expdir }}/git_info.log"
{% endif %}
9 changes: 8 additions & 1 deletion parm/archive/master_gdas.yaml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ datasets:
# Determine if we will save restart ICs or not (only valid for cycled)
{% set save_warm_start_forecast, save_warm_start_cycled = ( False, False ) %}

{% if ARCH_CYC == cycle_HH | int%}
{% if ARCH_CYC == cycle_HH | int %}
# Save the forecast-only cycle ICs every ARCH_WARMICFREQ or ARCH_FCSTICFREQ days
{% if (current_cycle - SDATE).days % ARCH_WARMICFREQ == 0 %}
{% set save_warm_start_forecast = True %}
Expand Down Expand Up @@ -97,3 +97,10 @@ datasets:

# End of restart checking
{% endif %}

# Archive the EXPDIR if requested
{% if archive_expdir %}
{% filter indent(width=4) %}
{% include "expdir.yaml.j2" %}
{% endfilter %}
{% endif %}
7 changes: 7 additions & 0 deletions parm/archive/master_gefs.yaml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,10 @@ datasets:
{% include "gefs_extracted_ice.yaml.j2" %}
{% include "gefs_extracted_wave.yaml.j2" %}
{% endfilter %}

# Archive the EXPDIR if requested
{% if archive_expdir %}
{% filter indent(width=4) %}
{% include "expdir.yaml.j2" %}
{% endfilter %}
{% endif %}
7 changes: 7 additions & 0 deletions parm/archive/master_gfs.yaml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -98,3 +98,10 @@ datasets:
{% endfilter %}
{% endif %}
{% endif %}

# Archive the EXPDIR if requested
{% if archive_expdir %}
{% filter indent(width=4) %}
{% include "expdir.yaml.j2" %}
{% endfilter %}
{% endif %}
8 changes: 6 additions & 2 deletions parm/config/gefs/config.base
Original file line number Diff line number Diff line change
Expand Up @@ -333,9 +333,13 @@ if [[ ${HPSSARCH} = "YES" ]] && [[ ${LOCALARCH} = "YES" ]]; then
echo "Both HPSS and local archiving selected. Please choose one or the other."
exit 3
fi
export ARCH_CYC=00 # Archive data at this cycle for warm_start capability
export ARCH_WARMICFREQ=4 # Archive frequency in days for warm_start capability
export ARCH_CYC=00 # Archive data at this cycle for warm start and/or forecast-only capabilities
export ARCH_WARMICFREQ=4 # Archive frequency in days for warm start capability
export ARCH_FCSTICFREQ=1 # Archive frequency in days for gdas and gfs forecast-only capability
export ARCH_EXPDIR='YES' # Archive the EXPDIR configs, XML, and database
export ARCH_EXPDIR_FREQ=0 # How often to archive the EXPDIR in hours or 0 for first and last cycle only
export ARCH_HASHES='YES' # Archive the hashes of the GW and submodules and 'git status' for each; requires ARCH_EXPDIR
export ARCH_DIFFS='NO' # Archive the output of 'git diff' for the GW; requires ARCH_EXPDIR

export DELETE_COM_IN_ARCHIVE_JOB="YES" # NO=retain ROTDIR. YES default in arch.sh and earc.sh.

Expand Down
8 changes: 6 additions & 2 deletions parm/config/gfs/config.base
Original file line number Diff line number Diff line change
Expand Up @@ -479,9 +479,13 @@ if [[ ${HPSSARCH} = "YES" ]] && [[ ${LOCALARCH} = "YES" ]]; then
echo "FATAL ERROR: Both HPSS and local archiving selected. Please choose one or the other."
exit 4
fi
export ARCH_CYC=00 # Archive data at this cycle for warm_start capability
export ARCH_WARMICFREQ=4 # Archive frequency in days for warm_start capability
export ARCH_CYC=00 # Archive data at this cycle for warm start and/or forecast-only capabilities
export ARCH_WARMICFREQ=4 # Archive frequency in days for warm start capability
export ARCH_FCSTICFREQ=1 # Archive frequency in days for gdas and gfs forecast-only capability
export ARCH_EXPDIR='YES' # Archive the EXPDIR configs, XML, and database
export ARCH_EXPDIR_FREQ=0 # How often to archive the EXPDIR in hours or 0 for first and last cycle only
export ARCH_HASHES='YES' # Archive the hashes of the GW and submodules and 'git status' for each; requires ARCH_EXPDIR
export ARCH_DIFFS='NO' # Archive the output of 'git diff' for the GW; requires ARCH_EXPDIR

# The monitor jobs are not yet supported for JEDIATMVAR.
if [[ ${DO_JEDIATMVAR} = "YES" ]]; then
Expand Down
26 changes: 13 additions & 13 deletions scripts/exglobal_archive.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import os

from pygfs.task.archive import Archive
from wxflow import AttrDict, Logger, cast_strdict_as_dtypedict, logit
from wxflow import AttrDict, Logger, cast_strdict_as_dtypedict, logit, chdir

# initialize root logger
logger = Logger(level=os.environ.get("LOGGING_LEVEL", "DEBUG"), colored_log=True)
Expand Down Expand Up @@ -32,7 +32,8 @@ def main():
'DO_AERO_ANL', 'DO_AERO_FCST', 'DO_CA', 'DOIBP_WAV', 'DO_JEDIOCNVAR',
'NMEM_ENS', 'DO_JEDIATMVAR', 'DO_VRFY_OCEANDA', 'FHMAX_FITS', 'waveGRD',
'IAUFHRS', 'DO_FIT2OBS', 'NET', 'FHOUT_HF_GFS', 'FHMAX_HF_GFS', 'REPLAY_ICS',
'OFFSET_START_HOUR']
'OFFSET_START_HOUR', 'ARCH_EXPDIR', 'EXPDIR', 'ARCH_EXPDIR_FREQ', 'ARCH_HASHES',
'ARCH_DIFFS', 'SDATE', 'EDATE', 'HOMEgfs']

archive_dict = AttrDict()
for key in keys:
Expand All @@ -47,21 +48,20 @@ def main():
if archive_dict[key] is None:
print(f"Warning: key ({key}) not found in task_config!")

cwd = os.getcwd()
with chdir(config.ROTDIR):

os.chdir(config.ROTDIR)
# Determine which archives to create
arcdir_set, atardir_sets = archive.configure(archive_dict)

# Determine which archives to create
arcdir_set, atardir_sets = archive.configure(archive_dict)
# Populate the product archive (ARCDIR)
archive.execute_store_products(arcdir_set)

# Populate the product archive (ARCDIR)
archive.execute_store_products(arcdir_set)
# Create the backup tarballs and store in ATARDIR
for atardir_set in atardir_sets:
archive.execute_backup_dataset(atardir_set)

# Create the backup tarballs and store in ATARDIR
for atardir_set in atardir_sets:
archive.execute_backup_dataset(atardir_set)

os.chdir(cwd)
# Clean up any temporary files
archive.clean()


if __name__ == '__main__':
Expand Down
Loading

0 comments on commit 3a8697d

Please sign in to comment.