A spinup acceleration tool for land surface model (LSM) family of ORCHIDEE.
Concept: The proposed machine-learning (ML)-enabled spin-up acceleration procedure (MLA) predicts the steady-state of any land pixel of the full model domain after training on a representative subset of pixels. As the computational efficiency of the current generation of LSMs scales linearly with the number of pixels and years simulated, MLA reduces the computation time quasi-linearly with the number of pixels predicted by ML.
Documentation of aims, concepts, workflows are described in Sun et al.202 [open-source]: https://onlinelibrary.wiley.com/doi/full/10.1111/gcb.16623
The SPINacc package includes:
- job - the job file for a bash environment
- job_tcsh - the job file for a tcsh environment
- main.py - the main python module
- Tools/* - folder with the other python modules
- DEF_*/ - folders containting the configuration files for each of the supported ORCHIDEE versions
- AuxilaryTools/SteadyState_checker.py - tool to assess the state of equilibration in ORCHIDEE simulations
- tests/ - the reproducibility code in Python
- requirements.txt - listing necessary dependencies to use SPINacc
- ORCHIDEE_cecill.txt - the same license used by ORCHIDEE
- docs/ - more detailed documentation about ORCHIDEE simulations
Here are the steps to launch the different tasks of this repository (and the reproducibility tests associated):
- Download the code:
git clone [email protected]:CALIPSO-project/SPINacc.git
- Find the associated ZENODO repository online (for reproducibility test including the corresponding ORCHIDEE forcing data) here: [https://doi.org/10.5281/zenodo.10514124]
- From ZENODO: DOWNLOAD ORCHIDEE_forcing_data.zip, unzip and store it in a directory '/your/path/to/SPINacc_ref/'
- From ZENODO: DOWNLOAD Reproducibility_tests_reference.zip, unzip and store it in a directory '/your/path/to/reference/'
- In your local machine: cd SPINacc
- If you want to stay on the main code skip this point, otherwise do : git checkout your_branch
- Create an execution directory: mkdir EXE_DIR
- In DEF_Trunk/varlist.json file : replace all the '/home/surface5/vbastri/' occurences with '/your/path/to/SPINacc_ref/vlad_files/vlad_files/'
- Choose the task you want to launch. In DEF_TRUNK/MLacc.def: in config[3] section put 1 (for task 1), in config[5] section put your path to your EXE_DIR and in config[7] put 0 for task 1 at least (for the following tasks you can use previous results).
- In job : setenv dirpython '/your/path/to/SPINacc/' and setenv dirdef 'DEF_Trunk/'
- In tests/config.py you have to modify: test_path=/your/path/to/SPINacc/EXE_DIR/
- Also in tests/config.py you have to modify: reference_path='/home/surface10/mrasolon/files_for_zenodo/reference/EXE_DIR/' to reference_path='/your/path/to/reference/'
- Then launch your first job using qsub -q short job, for task 1
- For following tasks (2, 3, 4 and 5) you just need to modify the config[3] and config[7] sections in DEF_TRUNK/MLacc.def
- For tasks 3 and 4, it is better to use qsub -q medium job
- Launching tasks in chain (e.g. "1, 2" or "3, 4, 5") will be a possibility soon
- The results of the tasks are located in your EXE_DIR
- The results of reproducibility tests are stored in EXE_DIR/tests_results.txt
(The detail of each tasks of the tool is provided in docs/documentation.txt)
The different tasks are (the number of tasks does not correspond to sequence - YET):
-
Task 1 [optional]: Evaluates the impact of varying the number of K-means clusters on model performance, setting a default of 4 clusters and producing a ‘dist_all.png’ graph.
-
Task 2 performs the clustering using a K mean algorithm and saves the information on the location of the selected pixels (files starting with 'ID'). The location of the selected pixel (red) for a given PFT and all pixel with a cover fraction exceeding 'cluster_thres' [defined in varlist.json] (grey) are plotted in the figures 'ClustRes_PFT**.png'. Example of PFT2 is shown here:
-
Task 3: Creates compressed forcing files for ORCHIDEE, containing data for selected pixels only, aligned on a global pseudo-grid for efficient pixel-level simulations, with file specifications listed in varlist.json.
-
Task 4 performs the ML training on results from ORCHIDEE simulation using the compressed forcing (production mode: resp-format=compressed) or global forcing (debug mode: resp-format=global), extrapolation to a global grid and writing the state variables into global restart files for ORCHIDEE. In debug mode Task 4 also performs the evaluation of ML training outputs vs real model outputs.
-
Task 5 [optional]: Visualizes ML performance from Task 3, offering two evaluation modes: global pixel evaluation and leave-one-cross-validation (LOOCV) for training sites, generating plots for various state variables at the PFT level, including comparisons of ML predictions with conventional spinup data.
The configuration file has been updated to include new parameters that control the execution of reproducibility tests for each task. These parameters are:
config[17]: Controls the reproducibility test for Task 1. config[19]: Controls the reproducibility test for Task 2. config[21]: Controls the reproducibility test for Task 3. config[23]: Controls the reproducibility test for Task 4.
For each parameter, setting the value to 1 enables the reproducibility test for the corresponding task, while setting it to 0 disables it.