Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable CI testing for C48-S2SWA-gefs on AWS and other CSPs #3102

Closed

Conversation

weihuang-jedi
Copy link
Contributor

@weihuang-jedi weihuang-jedi commented Nov 14, 2024

Description

Enabling CI testing on AWS for C48-S2SWA-gefs case.

Resolves #3101

Type of change

  • Bug fix (fixes something broken)
  • New feature (adds functionality)
  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO (but may need to add with others at later time)
  • Does this change require an update to any of the following submodules? NO (If YES, please add a link to any PRs that are pending.)
    • EMC verif-global
    • GDAS
    • GFS-utils
    • GSI
    • GSI-monitor
    • GSI-utils
    • UFS-utils
    • UFS-weather-model
    • wxflow

How has this been tested?

  • Clone and build on AWS
  • C48-S2SWA-gefs run on AWS

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added
  • Any new scripts have been added to the .github/CODEOWNERS file with owners
  • I have made corresponding changes to the system documentation if necessary

@weihuang-jedi
Copy link
Contributor Author

Please do review this draft PR.
This is a prototype test for CI testing on AWs for C48-S2SWA-gefs case.

@weihuang-jedi weihuang-jedi self-assigned this Nov 14, 2024
@weihuang-jedi weihuang-jedi added the CI/CD Issue related to CI/CD label Nov 15, 2024
Comment on lines -16 to -18
# Source versions file for runtime
source "${HOMEgfs}/versions/run.ver"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be getting removed here. Please reinstate these lines that source run.ver. Will explain more in another comment in this file.

Comment on lines +20 to +41
"noaacloud")
#TODO this is a total kludge to get epic mount point for compute nodes
# to be the same as the login node. This should be workng from in the
# ALLNODES section of the User Bootstrap of Parllel Works but it doen't
# on the Rokcky Clusters (works fine in the Centos 7 cluster)
if [[ ! -d /contrib-epic/EPIC ]]; then
if [[ -d /contrib/Terry.McGuinness/SETUP ]]; then
/contrib/Terry.McGuinness/SETUP/mount-epic-contrib.sh
sudo systemctl daemon-reload
fi
fi
# Check if the OS is Rocky or CentOS
OS_NAME=$(grep -E '^ID=' /etc/os-release | sed -E 's/ID="?([^"]*)"?/\1/') || true
# Source versions file for runtime
source "${HOMEgfs}/versions/run.${MACHINE_ID}.${OS_NAME}.ver"
module load "module_base.${MACHINE_ID}"
;;
"wcoss2" | "hera" | "orion" | "hercules" | "gaea" | "jet" | "s4")
# Source versions file for runtime
source "${HOMEgfs}/versions/run.${MACHINE_ID}.ver"
module load "module_base.${MACHINE_ID}" module load "module_base.${MACHINE_ID}"
;;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple comments for these lines:

  1. You do not need the lines to source run.${MACHINE_ID}.ver. The lines above to source run.ver should come back. When the link script is run the appropriate run.${MACHINE}.ver is copied or linked to become run.ver.
  2. The logic to decide which run.${MACHINE_ID}.${OS_NAME}.ver should be moved to sorc/link_workflow.sh. This way, the correct run version file becomes run.ver. As discussed in comment 1.
  3. I don't think any changes are needed to ush/load_fv3gfs_modules.sh. The changes I discussed in comments 1 and 2 should move all of these changes to the appropriate place.

Let me know if you have any questions about my comments. :)

@@ -5,4 +5,8 @@ export spack_env=gsi-addon-env
source "${HOMEgfs:-}/versions/spack.ver"
export spack_mod_path="/contrib/spack-stack/spack-stack-${spack_stack_ver}/envs/gsi-addon-env/install/modulefiles/Core"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still need this run version file (run.noaacloud.ver) if you have the other two now? If so, when will this one be used?

@@ -6,7 +6,7 @@ export jasper_ver=2.0.32
export libpng_ver=1.6.37
export zlib_ver=1.2.13
export esmf_ver=8.5.0
export fms_ver=2023.02.01
export fms_ver=2023.04
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will change the fms_ver for all platforms if changed in spack.ver. Is that the intent? If not, please add export fms_ver=2023.04 to the noaacloud run version files.

@@ -23,7 +23,7 @@ export g2_ver=3.4.5
export sp_ver=2.5.0
export ip_ver=4.3.0
export gsi_ncdiag_ver=1.1.2
export g2tmpl_ver=1.10.2
export g2tmpl_ver=1.13.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as with fms_ver.

@@ -0,0 +1,10 @@
--- ufs_noaacloud.intel.lua 2024-10-03 15:54:33.334583588 +0000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the need for this be resolved before this gets merged?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diff should not be committed. The ufs-weather-model needs to be able to be built on AWS without the need for a diff in the global-workflow

Comment on lines +47 to +54
#TODO temp patch for build update for noaacload in advance of updating ufs_module.fd repo for global-workflow building
if [[ "${MACHINE_ID}" == "noaacloud" ]] ; then
patched=$(grep upp-addon-env modulefiles/ufs_noaacloud.intel.lua; echo $?)
if [[ ${patched} == "1" ]] ; then
patch -R modulefiles/ufs_noaacloud.intel.lua ../ufs_noaacloud.intel.diff
fi
fi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as one I also left in the newly added diff file. Can the need for this patch be resolved before merging this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Work is needed in the ufs-weather-model to eliminate the application of the diff. This is is a work around.

Comment on lines 113 to +117
# TODO: Commented out until components aligned for build
#source ../versions/build.ver
if [[ "${MACHINE_ID}" == "noaacloud" ]] ; then
source "../versions/build.${MACHINE_ID}.ver"
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment we are allowing all components to build with their own build.ver files. Once all components are aligned to build with the same library versions we will uncomment line 114 and return to sourcing the global-workflow build.ver at build time, which overrides all build versions that the components. I'm guessing these lines were added to do a similar library version force on all of the components that are building on the cloud? Please confirm, thanks!

@@ -114,7 +114,7 @@ jobs:
- name: Build components
run: |
cd ${{ env.TEST_DIR }}/HOMEgfs/sorc
./build_all.sh -j 8
./build_all.sh -w -j 16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will build the GEFS variant of the model (as is required for the GEFS test). Doing so, will no longer test GFS tests.
What/How do we plan to test GFS and GEFS simulateously?

@@ -0,0 +1,12 @@
export stack_intel_ver=2021.3.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are moving from CentOS to Rocky, why is this file required?

@weihuang-jedi
Copy link
Contributor Author

This is a test trying to duplicate Terry's CI testing code, but trying to add c48gefs.
which did not work for now.
We need to switch to EPIC's new plan for CI testing, so close this one for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD Issue related to CI/CD
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable CI testing for C48-S2SWA-gefs on AWS and other CSPs
3 participants