-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable CI testing for C48-S2SWA-gefs on AWS and other CSPs #3102
Conversation
Please do review this draft PR. |
# Source versions file for runtime | ||
source "${HOMEgfs}/versions/run.ver" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be getting removed here. Please reinstate these lines that source run.ver
. Will explain more in another comment in this file.
"noaacloud") | ||
#TODO this is a total kludge to get epic mount point for compute nodes | ||
# to be the same as the login node. This should be workng from in the | ||
# ALLNODES section of the User Bootstrap of Parllel Works but it doen't | ||
# on the Rokcky Clusters (works fine in the Centos 7 cluster) | ||
if [[ ! -d /contrib-epic/EPIC ]]; then | ||
if [[ -d /contrib/Terry.McGuinness/SETUP ]]; then | ||
/contrib/Terry.McGuinness/SETUP/mount-epic-contrib.sh | ||
sudo systemctl daemon-reload | ||
fi | ||
fi | ||
# Check if the OS is Rocky or CentOS | ||
OS_NAME=$(grep -E '^ID=' /etc/os-release | sed -E 's/ID="?([^"]*)"?/\1/') || true | ||
# Source versions file for runtime | ||
source "${HOMEgfs}/versions/run.${MACHINE_ID}.${OS_NAME}.ver" | ||
module load "module_base.${MACHINE_ID}" | ||
;; | ||
"wcoss2" | "hera" | "orion" | "hercules" | "gaea" | "jet" | "s4") | ||
# Source versions file for runtime | ||
source "${HOMEgfs}/versions/run.${MACHINE_ID}.ver" | ||
module load "module_base.${MACHINE_ID}" module load "module_base.${MACHINE_ID}" | ||
;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple comments for these lines:
- You do not need the lines to source
run.${MACHINE_ID}.ver
. The lines above to sourcerun.ver
should come back. When the link script is run the appropriaterun.${MACHINE}.ver
is copied or linked to becomerun.ver
. - The logic to decide which
run.${MACHINE_ID}.${OS_NAME}.ver
should be moved tosorc/link_workflow.sh
. This way, the correct run version file becomesrun.ver
. As discussed in comment 1. - I don't think any changes are needed to
ush/load_fv3gfs_modules.sh
. The changes I discussed in comments 1 and 2 should move all of these changes to the appropriate place.
Let me know if you have any questions about my comments. :)
@@ -5,4 +5,8 @@ export spack_env=gsi-addon-env | |||
source "${HOMEgfs:-}/versions/spack.ver" | |||
export spack_mod_path="/contrib/spack-stack/spack-stack-${spack_stack_ver}/envs/gsi-addon-env/install/modulefiles/Core" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still need this run version file (run.noaacloud.ver
) if you have the other two now? If so, when will this one be used?
@@ -6,7 +6,7 @@ export jasper_ver=2.0.32 | |||
export libpng_ver=1.6.37 | |||
export zlib_ver=1.2.13 | |||
export esmf_ver=8.5.0 | |||
export fms_ver=2023.02.01 | |||
export fms_ver=2023.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will change the fms_ver
for all platforms if changed in spack.ver
. Is that the intent? If not, please add export fms_ver=2023.04
to the noaacloud run version files.
@@ -23,7 +23,7 @@ export g2_ver=3.4.5 | |||
export sp_ver=2.5.0 | |||
export ip_ver=4.3.0 | |||
export gsi_ncdiag_ver=1.1.2 | |||
export g2tmpl_ver=1.10.2 | |||
export g2tmpl_ver=1.13.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as with fms_ver
.
@@ -0,0 +1,10 @@ | |||
--- ufs_noaacloud.intel.lua 2024-10-03 15:54:33.334583588 +0000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the need for this be resolved before this gets merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This diff should not be committed. The ufs-weather-model needs to be able to be built on AWS without the need for a diff in the global-workflow
#TODO temp patch for build update for noaacload in advance of updating ufs_module.fd repo for global-workflow building | ||
if [[ "${MACHINE_ID}" == "noaacloud" ]] ; then | ||
patched=$(grep upp-addon-env modulefiles/ufs_noaacloud.intel.lua; echo $?) | ||
if [[ ${patched} == "1" ]] ; then | ||
patch -R modulefiles/ufs_noaacloud.intel.lua ../ufs_noaacloud.intel.diff | ||
fi | ||
fi | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as one I also left in the newly added diff file. Can the need for this patch be resolved before merging this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Work is needed in the ufs-weather-model to eliminate the application of the diff. This is is a work around.
# TODO: Commented out until components aligned for build | ||
#source ../versions/build.ver | ||
if [[ "${MACHINE_ID}" == "noaacloud" ]] ; then | ||
source "../versions/build.${MACHINE_ID}.ver" | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment we are allowing all components to build with their own build.ver
files. Once all components are aligned to build with the same library versions we will uncomment line 114 and return to sourcing the global-workflow build.ver
at build time, which overrides all build versions that the components. I'm guessing these lines were added to do a similar library version force on all of the components that are building on the cloud? Please confirm, thanks!
@@ -114,7 +114,7 @@ jobs: | |||
- name: Build components | |||
run: | | |||
cd ${{ env.TEST_DIR }}/HOMEgfs/sorc | |||
./build_all.sh -j 8 | |||
./build_all.sh -w -j 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will build the GEFS variant of the model (as is required for the GEFS test). Doing so, will no longer test GFS tests.
What/How do we plan to test GFS and GEFS simulateously?
@@ -0,0 +1,12 @@ | |||
export stack_intel_ver=2021.3.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are moving from CentOS to Rocky, why is this file required?
This is a test trying to duplicate Terry's CI testing code, but trying to add c48gefs. |
Description
Enabling CI testing on AWS for C48-S2SWA-gefs case.
Resolves #3101
Type of change
Change characteristics
How has this been tested?
Checklist