Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rrfs.v1.0.0 release branch update with GDIT's performance optimization #811

Open
wants to merge 3 commits into
base: release/rrfs.v1.0.0
Choose a base branch
from

Conversation

ShunLiu-NOAA
Copy link
Contributor

Description

  • add GDIT's performance optimization for RRFS to release branch
  • initialize variables in observer_fv3reg.f90 to avoid the warning message when compiling

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

@ShunLiu-NOAA ShunLiu-NOAA changed the title Rrfs.v1.0.0 rrfs.v1.0.0 release branch update with GDIT's performance optimization Dec 2, 2024
Comment on lines +253 to +262
xcent=quarter*(xc(1,1)+xc(1,ny)+xc(nx,1)+xc(nx,ny))
ycent=quarter*(yc(1,1)+yc(1,ny)+yc(nx,1)+yc(nx,ny))
zcent=quarter*(zc(1,1)+zc(1,ny)+zc(nx,1)+zc(nx,ny))

rnorm=one/sqrt(xcent**2+ycent**2+zcent**2)
xcent=rnorm*xcent
ycent=rnorm*ycent
zcent=rnorm*zcent
centlat=asin(zcent)*rad2deg
centlon=atan2(ycent,xcent)*rad2deg
rnorm=one/sqrt(xcent**2+ycent**2+zcent**2)
xcent=rnorm*xcent
ycent=rnorm*ycent
zcent=rnorm*zcent
centlat=asin(zcent)*rad2deg
centlon=atan2(ycent,xcent)*rad2deg

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like some additional indents were added in mod_fv3_lola.f90. Not sure if that is intended or not.

local netcdf_ver=os.getenv("netcdf_ver") or "4.7.4"
local zlib_ver=os.getenv("zlib_ver") or "1.2.11"
local hdf5_ver=os.getenv("hdf5_ver") or "1.14.0"
local netcdf_ver=os.getenv("netcdf_ver") or "4.9.2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need those setup changes for this PR? if those changes are needed for this PR , does that mean similar changes are needed for other machines?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we only run this on WCOSS2 for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! . I suspect the different modules used in this PR compared with the "control" branch (release/rrfs.v1.0.0) caused the failures of my ctests reported previously. I am working on this.

@@ -930,6 +948,7 @@ subroutine general_read_fv3_regional(this,fv3_filenameginput,g_ps,g_u,g_v,g_tv,g
couplerres=fv3_filenameginput%couplerres


!write(6,'("general_read_fv3_regional: fv3sar_ensemble_opt= " I4)') fv3sar_ensemble_opt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those commented out lines could be deleted or just uncommented if needed.

@@ -0,0 +1,23 @@
module crc32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could "crc32" be replaced with a more informative name?

implicit none
character(len=*), intent(in) :: m
!m='nid001019'
digest=abs(digest_c(trim(m)//c_null_char))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand digest_c will return a unique "checksum" to be used as an unique "id" for the later use.
But, to do this, GSI need to directly use zlib and add the crc32_c.c , a c code in gsi, which has been a pure fortran package. Can we use simpler while still working way to specify an unique string here?
For example, can we add a fortran hash function to do this ?
I am interested in hearing other GSI developers' opinions on this.

@@ -0,0 +1,7 @@
#include <stdio.h>
Copy link
Contributor

@TingLei-NOAA TingLei-NOAA Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the above comment.
I hesitate to justify adding such a c code to the currently pure fortran GSI only for this function.
A correction. I saw there is already a c code in gsi : blockIO.c.
Then, the question is for the use of the additionally required zlib.

@@ -128,8 +128,12 @@ subroutine generate_anl_grid(nx,ny,grid_lon,grid_lont,grid_lat,grid_latt)
use gridmod, only:grid_ratio_fv3_regional, region_lat,region_lon,nlat,nlon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general comment. I hope in this step, a verification step is added. Namely, GSI need to know if the existing "parameters/coefficents' files exactly match the current setup (the grid setup, the mpi setup (mpi rank numbers and its layout ). This verification is important to avoid the errors of mismatching between the to read-in coefficients and the right ones, which is hard to notice when GSI could successfully finish while giving normal results.

@@ -164,431 +168,641 @@ subroutine generate_anl_grid(nx,ny,grid_lon,grid_lont,grid_lat,grid_latt)
real(r_kind) d(4),ds
integer(i_kind) kk,k

integer(i_kind) name_len,nodeID,nodeComm,nodeRank,RanksPerNode,ierr,npes,inunit
character(len=72) :: input
character(len=5) :: np,nlatc,nlonc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could more informative names be used for those character variables? like str_np, str_nlat ..?

@TingLei-NOAA
Copy link
Contributor

TingLei-NOAA commented Dec 4, 2024

@ShunLiu-NOAA As we discussed off-line, the PR looks good to me when it is supposed to merge to the release branch, except for a minor change to add an explicit parameter to help users know the fv3reg conversion fix files could be used.
But, all the regression tests for gsi failed using origin/release/rrfs.v1.0.0 as the control as shown below:

The following tests FAILED:
          1 - global_4denvar (Failed)
          3 - rrfs_3denvar_glbens (Failed)
          4 - netcdf_fv3_regional (Failed)
          5 - hafs_4denvar_glbens (Failed)
          6 - hafs_3denvar_hybens (Failed)

Among them , global_4denvar, netcdf_fv3_regional,hafs_3denvar/4denvar failed for "penalty nonreproducible".
The update run for 3denvar_bloens_loproc_update just failed for the rundir not being set up correctly (coupler.res is missing).
Did I miss anything?

@ShunLiu-NOAA
Copy link
Contributor Author

@TingLei-NOAA Thank you. Let me make another test with my test case again.

@hu5970
Copy link
Collaborator

hu5970 commented Dec 4, 2024

@ShunLiu-NOAA As we discussed off-line, the PR looks good to me when it is supposed to merge to the release branch, except for a minor change to add an explicit parameter to help users know the fv3reg conversion fix files could be used. But, all the regression tests for gsi failed using origin/release/rrfs.v1.0.0 as the control as shown below:

The following tests FAILED:
          1 - global_4denvar (Failed)
          3 - rrfs_3denvar_glbens (Failed)
          4 - netcdf_fv3_regional (Failed)
          5 - hafs_4denvar_glbens (Failed)
          6 - hafs_3denvar_hybens (Failed)

Among them , global_4denvar, netcdf_fv3_regional,hafs_3denvar/4denvar failed for "penalty nonreproducible". The update run for 3denvar_bloens_loproc_update just failed for the rundir not being set up correctly (coupler.res is missing). Did I miss anything?

Ting, Could you tell me how did you run regression tests. This PR is based on branch RRFS.v1. Did you run regression test based on the branch hash that this PR based on?

@TingLei-NOAA
Copy link
Contributor

@hu5970 the control is (git log)

commit 529b6ea6b47624731d72610c388171b778dd3786 (HEAD, origin/release/rrfs.v1.0.0)
Author: Ming Hu <[email protected]>
Date:   Tue May 7 13:25:38 2024 -0400

    Bring read_radar bug fix (PR 738) to RRFS release branch. (#744)

I am doing a clean ctest (deleted previous tmpdir ) and will see if the different hdf/netcdf modules caused the differences if the re-run ctests still failed.

@TingLei-NOAA
Copy link
Contributor

An update: using the PR's modulefiles and add the same zlib related lines in cmakelist.txt in src/gsi for the control GSI , the global 4denvar test still failed. Confusing. I am switching to debug mode GSI to investigate this issue.

@TingLei-NOAA
Copy link
Contributor

Update on Dec. 17,2024:
It is found when the release branch/rrfs1.0.0 has the same pcgsoi.f90 as well as the same gsi_wcoss2.intel.lua , on wcoss2, this PR 's global_4denvar test would pass.
Though the difference between pcgsoi.f90 and its original one is as :

+  use mpi, only : MPI_Wtime, MPI_Comm_World, MPI_REAL8, MPI_MAX, MPI_SUCCESS
   
 
   implicit none
@@ -191,6 +192,8 @@ subroutine pcgsoi()
   integer(i_kind) :: iortho
   logical :: print_verbose,ortho,diag_print
   logical :: lanlerr,read_success
+  real(kind=8) :: time_beg,time_end,walltime
+  integer(i_kind) :: ierr
 
 ! Step size diagnostic strings
   data step /'good', 'SMALL'/
@@ -638,7 +641,14 @@ subroutine pcgsoi()
 ! Write output analysis files
   if(.not.l4dvar) call prt_guess('analysis')
   call prt_state_norms(sval(1),'increment')
-  if (twodvar_regional .or. jiter == miter) call write_all(-1)
+  if (twodvar_regional .or. jiter == miter) then
+    time_beg=MPI_Wtime()
+    call write_all(-1)
+    time_end=MPI_Wtime()
+    call MPI_Reduce(time_end-time_beg, walltime, 1, MPI_REAL8, MPI_MAX, 0, MPI_COMM_WORLD, ierr)
+    if(ierr /= MPI_SUCCESS) print*,'MPI_Reduce ',ierr
+    if(mype==0) write(6,'("Maximum Walltime for write_all" f15.4)') walltime
+  endif

which could not explain this behavior.
I would double check on those results to make sure I didn't miss anything.

@ShunLiu-NOAA
Copy link
Contributor Author

Thank you @TingLei-NOAA

@TingLei-NOAA
Copy link
Contributor

An update. seems the success of ctests reported earlier was because of all sat obs were excluded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants