CESM2 on HEXAGON
1 Prerequisites
If you have not obtained access to HEXAGON, apply at https://skjemaker.app.uib.no/view.php?id=2901837
You need to be member of the unix group hpcnoresm to be able to access the code and input data in /shared/projects/noresm. Write to support@hpc.uib.no (with cc to ingo.bethke@norceresearch.no) to become member of the group.
2 Installing the model
Install the model in your home directory with
cd tar xvf /shared/projects/noresm/models/CESM2.0_latest.tgz
3 Aqua-planet with slab-ocean and thermodynamic sea ice
3.1 Load python module
CESM2's script environment is python based. The model requires python version 2.7 or newer, which has to be loaded on HEXAGON with
module load python/2.7.2-dso
For more information on loading modules on HEXAGON, visit https://docs.hpc.uib.no/wiki/Application_development_(Hexagon)
3.2 Create a new case
Change to CESM2.0's scripts directory with
cd $HOME/CESM2.0/cime/scripts
Create an aqua-planet case with
./create_newcase --case $HOME/CESM2.0/cases/QSIC5_f09_f09_test_01 --compset QSIC5 --res f09_f09_mg17 --machine hexagon --pecount M --run-unsupported
This will create a case directory in $HOME/CESM2.0/cases/QSIC5_f09_f09_test_01 (feel free to choose a different name and location).
The rest of this section further explains the above choice of options. Type "./create_newcase --help" for detailed information of all possible options.
3.2.1 Choosing component set with --compset option
The case uses the predefined aqua-planet (Q) component set QSIC5 that uses a slab-ocean (S), thermodynamic sea ice (I) coupled to a 30-layer configuration of CAM5 (C5).
3.2.2 Choosing resolution with --res option
In combination with the component set QSIC5, the horizontal resolution configuration f09_f09_mg17 specifies NCAR's 0.9x1.25 finite-volume lonlat grid for all active components.
The grid converges towards the poles and is therefore not suitable for use with dynamic sea ice. Sea ice dynamics are hence deactivated in QSIC5 setting kdyn=0 in the sea ice component's namelist.
3.2.3 Choosing a predefined cpu-configuration with --pecount option
For QSIC5 on 0.9x1.25, possible arguments of --pecount are S, M (default if --pecount is omitted), L, X1 and X2.
The corresponding number of cores are 192 (S), 320 (M), 384 (L), 640 (X1) and 960 (X2).
The user maximum is 1024 cores on HEXAGON. This allows concurrent integration of e.g. 1 X2, or 1 L + 1 X1, or 2 M + 1 L, or 2 M + 2 S, or 5 S.
The M default produces a throughput of 3.1 sim-years per integration day.
3.3 Set up case
Change to case directory with
cd $HOME/CESM2.0/cases/QSIC5_f09_f09_test_01
Execute case-setup script with
./case.setup
This will create build and job scripts under the case directory and also prepare the run-directory in /work/users/$USER/noresm/QSIC5_f09_f09_test_01/run.
3.4 Customize component namelist files
The case directory contains the four user namelist files user_nl_cam, user_nl_cice, user_nl_cpl and user_nl_docn, which can be customized e.g. to specify additional diagnostic output.
After changing the user namelist files, you can optionally execute
./preview_namelists
This will update the _in-namelist files in the run-directory so one can review the updates. However, use of preview_namelists is not necessary as the namelists are updated on job submission.
3.5 Build model
In case directory, execute
./case.build
This will build the model and perform other tasks in the run directory.
3.6 Run time options (run length, resubmission etc)
Length of the integration and similar can be customized in the file env_run.xml in the case directory.
One can either edit env_run.xml using an editor (e.g. vi) or from the command line using the xmlchange script available in the case directory (type "./xmlchange --help" for usage).
3.6.1 Initial versus continuation
At the beginning of the simulation the value of CONTINUE_RUN must be set to FALSE. If set to true then the model attempts to restart from restart conditions produced by the SAME simulation that must be present in the run directory.
If you want to continue a simulation then make sure that CONTINUE_RUN is set to TRUE.
3.6.2 Run length
The run length is specified by STOP_OPTION and STOP_N. The default is 5 days. Set STOP_OPTION to nyears and STOP_N to 1 to specify a run length of 1 year.
3.6.3 Automatic resubmission and short term archiving
If RESUBMIT is set to n>0 then the job is automatically resubmitted n times. By default, CONTINUE_RUN will automatically set to TRUE during resubmission.
After each integration, diagnostic output and restart information is moved from the run-directory to the short-term archiving location /work/users/$USER/archive/QSIC5_f09_f09_test_01
Note that the /work/users disk area is subject to automatic deletion (see https://docs.hpc.uib.no/wiki/Data_Handling_and_Storage_Policy#Scratch_area:_.2Fwork ). Once the simulation is completed it is therefore recommended to move the output to a different location.
3.7 Integration time and job submission
3.7.1 Setting integration time
The value of JOB_WALLCLOCK_TIME in env_batch.xml specifies the maximum integration time.
The default for is set to 4 days on HEXAGON. If the machine load is high then a specification of a shorter wall-clock time will result in shorter queuing time.
The specified time corresponds to the limit for a single resubmission. For example, if the model resubmits itself after each simulation year then choose a wall-clock time sufficient to run one simulation year.
3.7.2 Job submission
To submit the job execute
./case.submit
Once the job starts to run, the model will write log files and output in the run directory
/work/users/$USER/noresm/QSIC5_f09_f09_test_01/run
To check the queuing status type
squeue -u $USER
To cancel a job use
scancel <job id>
where <job id> is obtained from squeue.
More information on the queuing system is found at https://docs.hpc.uib.no/wiki/Job_execution_(Hexagon)