CESM2 on HEXAGON: Difference between revisions

From Norcpm
No edit summary
mNo edit summary
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
== Prerequisites ==   
== 1 Prerequisites ==   


If you have not obtained access to HEXAGON, apply at https://skjemaker.app.uib.no/view.php?id=2901837
If you have not obtained access to HEXAGON, apply at https://skjemaker.app.uib.no/view.php?id=2901837
Line 6: Line 6:


   
   
== Installing the model ==
== 2 Installing the model ==


Install the model in your home directory with
Install the model in your home directory with
Line 13: Line 13:




== Aqua-planet with slab-ocean and thermodynamic sea ice ==
== 3 Aqua-planet with slab-ocean and thermodynamic sea ice ==


=== Load python module ===
=== 3.1 Load python module ===


CESM2's script environment is python based. The model requires python version 2.7 or newer, which has to be loaded on HEXAGON with  
CESM2's script environment is python based. The model requires python version 2.7 or newer, which has to be loaded on HEXAGON with  
Line 22: Line 22:
For more information on loading modules on HEXAGON, visit https://docs.hpc.uib.no/wiki/Application_development_(Hexagon)
For more information on loading modules on HEXAGON, visit https://docs.hpc.uib.no/wiki/Application_development_(Hexagon)


=== Create a new case ===
=== 3.2 Create a new case ===


Change to CESM2.0's scripts directory with
Change to CESM2.0's scripts directory with
Line 34: Line 34:
The rest of this section further explains the above choice of options. Type "./create_newcase --help" for detailed information of all possible options.
The rest of this section further explains the above choice of options. Type "./create_newcase --help" for detailed information of all possible options.


==== Choosing component set with --compset option ====
==== 3.2.1 Choosing component set with --compset option ====


The case uses the predefined aqua-planet (Q) component set QSIC5 that uses a slab-ocean (S), thermodynamic sea ice (I) coupled to a 30-layer configuration of CAM5 (C5).  
The case uses the predefined aqua-planet (Q) component set QSIC5 that uses a slab-ocean (S), thermodynamic sea ice (I) coupled to a 30-layer configuration of CAM5 (C5).  


==== Choosing resolution with --res option ====
==== 3.2.2 Choosing resolution with --res option ====


In combination with the component set QSIC5, the horizontal resolution configuration f09_f09_mg17 specifies NCAR's 0.9x1.25 finite-volume lonlat grid for all active components.   
In combination with the component set QSIC5, the horizontal resolution configuration f09_f09_mg17 specifies NCAR's 0.9x1.25 finite-volume lonlat grid for all active components.   
Line 44: Line 44:
The grid converges towards the poles and is therefore not suitable for use with dynamic sea ice. Sea ice dynamics are hence deactivated in QSIC5 setting kdyn=0 in the sea ice component's namelist.   
The grid converges towards the poles and is therefore not suitable for use with dynamic sea ice. Sea ice dynamics are hence deactivated in QSIC5 setting kdyn=0 in the sea ice component's namelist.   


==== Choosing a predefined cpu-configuration with --pecount option ====
==== 3.2.3 Choosing a predefined cpu-configuration with --pecount option ====


For QSIC5 on 0.9x1.25, possible arguments of --pecount are S, M (default if --pecount is omitted), L, X1 and X2.  
For QSIC5 on 0.9x1.25, possible arguments of --pecount are S, M (default if --pecount is omitted), L, X1 and X2.  
Line 54: Line 54:
The M default produces a throughput of 3.1 sim-years per integration day.  
The M default produces a throughput of 3.1 sim-years per integration day.  


=== Set up case ===
=== 3.3 Set up case ===


Change to case directory with
Change to case directory with
Line 62: Line 62:
   ./case.setup  
   ./case.setup  


This will create a sub-directory Buildconf with namelist files under the case directory and also prepare the run-directory in /work/users/$USER/noresm/QSIC5_f09_f09_test_01/run.  
This will create build and job scripts under the case directory and also prepare the run-directory in /work/users/$USER/noresm/QSIC5_f09_f09_test_01/run.  


=== Customize component namelist files ===
=== 3.4 Customize component namelist files ===


The case directory contains the four user namelist files user_nl_cam, user_nl_cice, user_nl_cpl and user_nl_docn, which can be customized e.g. to specify additional diagnostic output.  
The case directory contains the four user namelist files user_nl_cam, user_nl_cice, user_nl_cpl and user_nl_docn, which can be customized e.g. to specify additional diagnostic output.  
Line 73: Line 73:
This will update the _in-namelist files in the run-directory so one can review the updates. However, use of preview_namelists is not necessary as the namelists are updated on job submission.  
This will update the _in-namelist files in the run-directory so one can review the updates. However, use of preview_namelists is not necessary as the namelists are updated on job submission.  


=== Build model ===
=== 3.5 Build model ===


In case directory, execute  
In case directory, execute  
Line 80: Line 80:
This will build the model and perform other tasks in the run directory.  
This will build the model and perform other tasks in the run directory.  


=== Run time options (run length, resubmission etc) ===
=== 3.6 Run time options (run length, resubmission etc) ===


Length of the integration and similar can be customized in the file env_run.xml in the case directory.
Length of the integration and similar can be customized in the file env_run.xml in the case directory.
Line 86: Line 86:
One can either edit env_run.xml using an editor (e.g. vi) or from the command line using the xmlchange script available in the case directory (type "./xmlchange --help" for usage).
One can either edit env_run.xml using an editor (e.g. vi) or from the command line using the xmlchange script available in the case directory (type "./xmlchange --help" for usage).


==== Initial versus continuation  ====
==== 3.6.1 Initial versus continuation  ====


At the beginning of the simulation the value of CONTINUE_RUN must be set to FALSE. If set to true then the model attempts to restart from restart conditions produced by the SAME simulation that must be present in the run directory.  
At the beginning of the simulation the value of CONTINUE_RUN must be set to FALSE. If set to true then the model attempts to restart from restart conditions produced by the SAME simulation that must be present in the run directory.  
Line 92: Line 92:
If you want to continue a simulation then make sure that CONTINUE_RUN is set to TRUE.  
If you want to continue a simulation then make sure that CONTINUE_RUN is set to TRUE.  


==== Run length ====
==== 3.6.2 Run length ====


The run length is specified by STOP_OPTION and STOP_N. The default is 5 days. Set STOP_OPTION to nyears and STOP_N to 1 to specify a run length of 1 year.
  
The run length is specified by STOP_OPTION and STOP_N. The default is 5 days. Set STOP_OPTION to nyears and STOP_N to 1 to specify a run length of 1 year.
  
3.5.3 Automatic resubmission and short term archiving  
 
==== 3.6.3 Automatic resubmission and short term archiving ====


If RESUBMIT is set to n>0 then the job is automatically resubmitted n times. By default, CONTINUE_RUN will automatically set to TRUE during resubmission.
If RESUBMIT is set to n>0 then the job is automatically resubmitted n times. By default, CONTINUE_RUN will automatically set to TRUE during resubmission.
Line 103: Line 104:
Note that the /work/users disk area is subject to automatic deletion (see https://docs.hpc.uib.no/wiki/Data_Handling_and_Storage_Policy#Scratch_area:_.2Fwork ). Once the simulation is completed it is therefore recommended to move the output to a different location.  
Note that the /work/users disk area is subject to automatic deletion (see https://docs.hpc.uib.no/wiki/Data_Handling_and_Storage_Policy#Scratch_area:_.2Fwork ). Once the simulation is completed it is therefore recommended to move the output to a different location.  


=== Integration time and job submission ===
=== 3.7 Integration time and job submission ===


==== Setting integration time ====
==== 3.7.1 Setting integration time ====


The value of JOB_WALLCLOCK_TIME in env_batch.xml specifies the maximum integration time.  
The value of JOB_WALLCLOCK_TIME in env_batch.xml specifies the maximum integration time.  
Line 113: Line 114:
The specified time corresponds to the limit for a single resubmission. For example, if the model resubmits itself after each simulation year then choose a wall-clock time sufficient to run one simulation year.  
The specified time corresponds to the limit for a single resubmission. For example, if the model resubmits itself after each simulation year then choose a wall-clock time sufficient to run one simulation year.  
   
   
==== Job submission ====
==== 3.7.2 Job submission ====


To submit the job execute  
To submit the job execute  

Latest revision as of 11:33, 25 October 2018

1 Prerequisites

If you have not obtained access to HEXAGON, apply at https://skjemaker.app.uib.no/view.php?id=2901837

You need to be member of the unix group hpcnoresm to be able to access the code and input data in /shared/projects/noresm. Write to support@hpc.uib.no (with cc to ingo.bethke@norceresearch.no) to become member of the group.


2 Installing the model

Install the model in your home directory with

 cd  
 tar xvf /shared/projects/noresm/models/CESM2.0_latest.tgz


3 Aqua-planet with slab-ocean and thermodynamic sea ice

3.1 Load python module

CESM2's script environment is python based. The model requires python version 2.7 or newer, which has to be loaded on HEXAGON with

 module load python/2.7.2-dso 

For more information on loading modules on HEXAGON, visit https://docs.hpc.uib.no/wiki/Application_development_(Hexagon)

3.2 Create a new case

Change to CESM2.0's scripts directory with

 cd $HOME/CESM2.0/cime/scripts 

Create an aqua-planet case with

 ./create_newcase --case $HOME/CESM2.0/cases/QSIC5_f09_f09_test_01 --compset QSIC5 --res f09_f09_mg17 --machine hexagon --pecount M --run-unsupported

This will create a case directory in $HOME/CESM2.0/cases/QSIC5_f09_f09_test_01 (feel free to choose a different name and location).

The rest of this section further explains the above choice of options. Type "./create_newcase --help" for detailed information of all possible options.

3.2.1 Choosing component set with --compset option

The case uses the predefined aqua-planet (Q) component set QSIC5 that uses a slab-ocean (S), thermodynamic sea ice (I) coupled to a 30-layer configuration of CAM5 (C5).

3.2.2 Choosing resolution with --res option

In combination with the component set QSIC5, the horizontal resolution configuration f09_f09_mg17 specifies NCAR's 0.9x1.25 finite-volume lonlat grid for all active components.

The grid converges towards the poles and is therefore not suitable for use with dynamic sea ice. Sea ice dynamics are hence deactivated in QSIC5 setting kdyn=0 in the sea ice component's namelist.

3.2.3 Choosing a predefined cpu-configuration with --pecount option

For QSIC5 on 0.9x1.25, possible arguments of --pecount are S, M (default if --pecount is omitted), L, X1 and X2.

The corresponding number of cores are 192 (S), 320 (M), 384 (L), 640 (X1) and 960 (X2).

The user maximum is 1024 cores on HEXAGON. This allows concurrent integration of e.g. 1 X2, or 1 L + 1 X1, or 2 M + 1 L, or 2 M + 2 S, or 5 S.

The M default produces a throughput of 3.1 sim-years per integration day.

3.3 Set up case

Change to case directory with

 cd $HOME/CESM2.0/cases/QSIC5_f09_f09_test_01 

Execute case-setup script with

 ./case.setup 

This will create build and job scripts under the case directory and also prepare the run-directory in /work/users/$USER/noresm/QSIC5_f09_f09_test_01/run.

3.4 Customize component namelist files

The case directory contains the four user namelist files user_nl_cam, user_nl_cice, user_nl_cpl and user_nl_docn, which can be customized e.g. to specify additional diagnostic output.

After changing the user namelist files, you can optionally execute

 ./preview_namelists  

This will update the _in-namelist files in the run-directory so one can review the updates. However, use of preview_namelists is not necessary as the namelists are updated on job submission.

3.5 Build model

In case directory, execute

 ./case.build 

This will build the model and perform other tasks in the run directory.

3.6 Run time options (run length, resubmission etc)

Length of the integration and similar can be customized in the file env_run.xml in the case directory.

One can either edit env_run.xml using an editor (e.g. vi) or from the command line using the xmlchange script available in the case directory (type "./xmlchange --help" for usage).

3.6.1 Initial versus continuation

At the beginning of the simulation the value of CONTINUE_RUN must be set to FALSE. If set to true then the model attempts to restart from restart conditions produced by the SAME simulation that must be present in the run directory.

If you want to continue a simulation then make sure that CONTINUE_RUN is set to TRUE.

3.6.2 Run length

The run length is specified by STOP_OPTION and STOP_N. The default is 5 days. Set STOP_OPTION to nyears and STOP_N to 1 to specify a run length of 1 year.


3.6.3 Automatic resubmission and short term archiving

If RESUBMIT is set to n>0 then the job is automatically resubmitted n times. By default, CONTINUE_RUN will automatically set to TRUE during resubmission.

After each integration, diagnostic output and restart information is moved from the run-directory to the short-term archiving location /work/users/$USER/archive/QSIC5_f09_f09_test_01

Note that the /work/users disk area is subject to automatic deletion (see https://docs.hpc.uib.no/wiki/Data_Handling_and_Storage_Policy#Scratch_area:_.2Fwork ). Once the simulation is completed it is therefore recommended to move the output to a different location.

3.7 Integration time and job submission

3.7.1 Setting integration time

The value of JOB_WALLCLOCK_TIME in env_batch.xml specifies the maximum integration time.

The default for is set to 4 days on HEXAGON. If the machine load is high then a specification of a shorter wall-clock time will result in shorter queuing time.

The specified time corresponds to the limit for a single resubmission. For example, if the model resubmits itself after each simulation year then choose a wall-clock time sufficient to run one simulation year.

3.7.2 Job submission

To submit the job execute

 ./case.submit 

Once the job starts to run, the model will write log files and output in the run directory

 /work/users/$USER/noresm/QSIC5_f09_f09_test_01/run 

To check the queuing status type

 squeue -u $USER 

To cancel a job use

 scancel <job id> 

where <job id> is obtained from squeue.

More information on the queuing system is found at https://docs.hpc.uib.no/wiki/Job_execution_(Hexagon)