Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable CI testing for C48-S2SWA-gefs on AWS and other CSPs #3102

Closed
6 changes: 3 additions & 3 deletions .github/workflows/pw_aws_ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ jobs:
- name: Build components
run: |
cd ${{ env.TEST_DIR }}/HOMEgfs/sorc
./build_all.sh -j 8
./build_all.sh -w -j 16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will build the GEFS variant of the model (as is required for the GEFS test). Doing so, will no longer test GFS tests.
What/How do we plan to test GFS and GEFS simulateously?


- name: Link artifacts
run: |
Expand All @@ -130,7 +130,7 @@ jobs:
- ${{ github.event.inputs.os }}
strategy:
matrix:
case: ["C48_ATM"]
case: ["C48_S2SWA_gefs"]
steps:
- name: Create Experiments ${{ matrix.case }}
env:
Expand All @@ -152,7 +152,7 @@ jobs:
- ${{ github.event.inputs.os }}
strategy:
matrix:
case: ["C48_ATM"]
case: ["C48_S2SWA_gefs"]
steps:
- name: Run Experiment ${{ matrix.case }}
run: |
Expand Down
3 changes: 3 additions & 0 deletions ci/platforms/config.noaacloud
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/usr/bin/bash

export HPC_ACCOUNT=${USER}
10 changes: 9 additions & 1 deletion env/AWSPW.env
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,13 @@ export launcher="srun -l --export=ALL"
export mpmd_opt="--multi-prog --output=mpmd.%j.%t.out"

# Configure MPI environment
export OMP_STACKSIZE=2048000
#export OMP_STACKSIZE=2048000
export NTHSTACK=1024000000

export OMP_STACKSIZE=512M
export KMP_AFFINITY=scatter
export OMP_NUM_THREADS=1

ulimit -s unlimited
ulimit -a

Expand Down Expand Up @@ -59,6 +63,10 @@ elif [[ "${step}" = "post" ]]; then
[[ ${NTHREADS_DWN} -gt ${max_threads_per_task} ]] && export NTHREADS_DWN=${max_threads_per_task}
export APRUN_DWN="${launcher} -n ${ntasks_dwn}"

elif [[ "${step}" = "prep_emissions" ]]; then

export APRUN="${APRUN_default}"

elif [[ "${step}" = "atmos_products" ]]; then

export USE_CFP="YES" # Use MPMD for downstream product generation on Hera
Expand Down
3 changes: 3 additions & 0 deletions sorc/build_all.sh
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,9 @@ fi
#------------------------------------
# TODO: Commented out until components aligned for build
#source ../versions/build.ver
if [[ "${MACHINE_ID}" == "noaacloud" ]] ; then
source "../versions/build.${MACHINE_ID}.ver"
fi
Comment on lines 113 to +117
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment we are allowing all components to build with their own build.ver files. Once all components are aligned to build with the same library versions we will uncomment line 114 and return to sourcing the global-workflow build.ver at build time, which overrides all build versions that the components. I'm guessing these lines were added to do a similar library version force on all of the components that are building on the cloud? Please confirm, thanks!


#------------------------------------
# Exception Handling Init
Expand Down
8 changes: 8 additions & 0 deletions sorc/build_ufs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,14 @@ COMPILE_NR=0
CLEAN_BEFORE=YES
CLEAN_AFTER=NO

#TODO temp patch for build update for noaacload in advance of updating ufs_module.fd repo for global-workflow building
if [[ "${MACHINE_ID}" == "noaacloud" ]] ; then
patched=$(grep upp-addon-env modulefiles/ufs_noaacloud.intel.lua; echo $?)
if [[ ${patched} == "1" ]] ; then
patch -R modulefiles/ufs_noaacloud.intel.lua ../ufs_noaacloud.intel.diff
fi
fi

Comment on lines +47 to +54
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as one I also left in the newly added diff file. Can the need for this patch be resolved before merging this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Work is needed in the ufs-weather-model to eliminate the application of the diff. This is is a work around.

BUILD_JOBS=${BUILD_JOBS:-8} ./tests/compile.sh "${MACHINE_ID}" "${MAKE_OPT}" "${COMPILE_NR}" "intel" "${CLEAN_BEFORE}" "${CLEAN_AFTER}"
mv "./tests/fv3_${COMPILE_NR}.exe" ./tests/ufs_model.x
mv "./tests/modules.fv3_${COMPILE_NR}.lua" ./tests/modules.ufs_model.lua
Expand Down
10 changes: 10 additions & 0 deletions sorc/ufs_noaacloud.intel.diff
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
--- ufs_noaacloud.intel.lua 2024-10-03 15:54:33.334583588 +0000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the need for this be resolved before this gets merged?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diff should not be committed. The ufs-weather-model needs to be able to be built on AWS without the need for a diff in the global-workflow

+++ ufs_model.fd/modulefiles/ufs_noaacloud.intel.lua 2024-10-03 16:11:28.534275972 +0000
@@ -3,7 +3,6 @@
]])

prepend_path("MODULEPATH", "/contrib/spack-stack/spack-stack-1.6.0/envs/unified-env/install/modulefiles/Core")
-prepend_path("MODULEPATH", "/contrib/spack-stack/spack-stack-1.6.0/envs/upp-addon-env/install/modulefiles/Core")

stack_intel_ver=os.getenv("stack_intel_ver") or "2021.3.0"
load(pathJoin("stack-intel", stack_intel_ver))
24 changes: 20 additions & 4 deletions ush/load_fv3gfs_modules.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,32 @@ ulimit_s=$( ulimit -S -s )
source "${HOMEgfs}/ush/detect_machine.sh"
source "${HOMEgfs}/ush/module-setup.sh"

# Source versions file for runtime
source "${HOMEgfs}/versions/run.ver"

Comment on lines -16 to -18
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be getting removed here. Please reinstate these lines that source run.ver. Will explain more in another comment in this file.

# Load our modules:
module use "${HOMEgfs}/modulefiles"

case "${MACHINE_ID}" in
"wcoss2" | "hera" | "orion" | "hercules" | "gaea" | "jet" | "s4" | "noaacloud")
"noaacloud")
#TODO this is a total kludge to get epic mount point for compute nodes
# to be the same as the login node. This should be workng from in the
# ALLNODES section of the User Bootstrap of Parllel Works but it doen't
# on the Rokcky Clusters (works fine in the Centos 7 cluster)
if [[ ! -d /contrib-epic/EPIC ]]; then
if [[ -d /contrib/Terry.McGuinness/SETUP ]]; then
/contrib/Terry.McGuinness/SETUP/mount-epic-contrib.sh
sudo systemctl daemon-reload
fi
fi
# Check if the OS is Rocky or CentOS
OS_NAME=$(grep -E '^ID=' /etc/os-release | sed -E 's/ID="?([^"]*)"?/\1/') || true
# Source versions file for runtime
source "${HOMEgfs}/versions/run.${MACHINE_ID}.${OS_NAME}.ver"
module load "module_base.${MACHINE_ID}"
;;
"wcoss2" | "hera" | "orion" | "hercules" | "gaea" | "jet" | "s4")
# Source versions file for runtime
source "${HOMEgfs}/versions/run.${MACHINE_ID}.ver"
module load "module_base.${MACHINE_ID}" module load "module_base.${MACHINE_ID}"
;;
Comment on lines +20 to +41
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple comments for these lines:

  1. You do not need the lines to source run.${MACHINE_ID}.ver. The lines above to source run.ver should come back. When the link script is run the appropriate run.${MACHINE}.ver is copied or linked to become run.ver.
  2. The logic to decide which run.${MACHINE_ID}.${OS_NAME}.ver should be moved to sorc/link_workflow.sh. This way, the correct run version file becomes run.ver. As discussed in comment 1.
  3. I don't think any changes are needed to ush/load_fv3gfs_modules.sh. The changes I discussed in comments 1 and 2 should move all of these changes to the appropriate place.

Let me know if you have any questions about my comments. :)

*)
echo "WARNING: UNKNOWN PLATFORM"
;;
Expand Down
12 changes: 12 additions & 0 deletions versions/run.noaacloud.centos.ver
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
export stack_intel_ver=2021.3.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are moving from CentOS to Rocky, why is this file required?

export stack_impi_ver=2021.3.0
export spack_env=gsi-addon-env

source "${HOMEgfs:-}/versions/spack.ver"
export spack_mod_path="/contrib/spack-stack/spack-stack-${spack_stack_ver}/envs/gsi-addon-env/install/modulefiles/Core"

export g2tmpl_ver=1.10.2
export jasper_ver=2.0.32
export wgrib2_ver=2.0.8
export cdo_ver=1.9.5
export nco_ver=4.9.3
12 changes: 12 additions & 0 deletions versions/run.noaacloud.rocky.ver
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
export stack_intel_ver=2021.3.0
export stack_impi_ver=2021.3.0
export spack_env=gsi-addon-env

source "${HOMEgfs:-}/versions/spack.ver"
export spack_mod_path="/contrib/spack-stack/spack-stack-${spack_stack_ver}/envs/gsi-addon-env/install/modulefiles/Core"

export g2tmpl_ver=1.10.2
export wgrib2_ver=3.1.2_wmo
export cdo_ver=2.3.0
export jasper_ver=4.2.0
export nco_ver=5.1.6
6 changes: 5 additions & 1 deletion versions/run.noaacloud.ver
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,8 @@ export spack_env=gsi-addon-env
source "${HOMEgfs:-}/versions/spack.ver"
export spack_mod_path="/contrib/spack-stack/spack-stack-${spack_stack_ver}/envs/gsi-addon-env/install/modulefiles/Core"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still need this run version file (run.noaacloud.ver) if you have the other two now? If so, when will this one be used?


export cdo_ver=2.2.0
export g2tmpl_ver=1.10.2
export wgrib2_ver=3.1.2_wmo
export cdo_ver=2.3.0
export jasper_ver=4.2.0
export nco_ver=5.1.6
4 changes: 2 additions & 2 deletions versions/spack.ver
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ export jasper_ver=2.0.32
export libpng_ver=1.6.37
export zlib_ver=1.2.13
export esmf_ver=8.5.0
export fms_ver=2023.02.01
export fms_ver=2023.04
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will change the fms_ver for all platforms if changed in spack.ver. Is that the intent? If not, please add export fms_ver=2023.04 to the noaacloud run version files.

export cdo_ver=2.2.0
export nco_ver=5.0.6

Expand All @@ -23,7 +23,7 @@ export g2_ver=3.4.5
export sp_ver=2.5.0
export ip_ver=4.3.0
export gsi_ncdiag_ver=1.1.2
export g2tmpl_ver=1.10.2
export g2tmpl_ver=1.13.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as with fms_ver.

export crtm_ver=2.4.0.1
export wgrib2_ver=2.0.8
export grib_util_ver=1.3.0
Expand Down
Loading