CESM2 on FRAM

From Norcpm
Revision as of 13:37, 15 January 2019 by Ibe062 (talk | contribs)

1 Prerequisites

If you have not obtained access to FRAM, apply at https://www.metacenter.no/user/application/form/notur (project nn9625k).

2 Installing the model

Install the model in your home directory with

 cd  
 tar xvf /cluster/projects/nn9625k/cesm/cesm2.1.0_latest.tgz

3 Aqua-planet with slab-ocean and thermodynamic sea ice

3.1 Create a new case

Change to CESM2.0's scripts directory with

 cd $HOME/CESM2.0/cime/scripts 

Create an aqua-planet case with

 ./create_newcase --case $HOME/CESM2.0/cases/QSIC5_f09_f09_test_01 --compset QSIC5 --res f09_f09_mg17 --machine hexagon --pecount M --run-unsupported

This will create a case directory in $HOME/CESM2.0/cases/QSIC5_f09_f09_test_01 (feel free to choose a different name and location).

The rest of this section further explains the above choice of options. Type "./create_newcase --help" for detailed information of all possible options.

3.1.1 Choosing component set with --compset option

The case uses the predefined aqua-planet (Q) component set QSIC5 that uses a slab-ocean (S), thermodynamic sea ice (I) coupled to a 30-layer configuration of CAM5 (C5).

3.1.2 Choosing resolution with --res option

In combination with the component set QSIC5, the horizontal resolution configuration f09_f09_mg17 specifies NCAR's 0.9x1.25 finite-volume lonlat grid for all active components.

The grid converges towards the poles and is therefore not suitable for use with dynamic sea ice. Sea ice dynamics are hence deactivated in QSIC5 setting kdyn=0 in the sea ice component's namelist.

3.1.3 Choosing a predefined cpu-configuration with --pecount option

For QSIC5 on 0.9x1.25, possible arguments of --pecount are S, M (default if --pecount is omitted), L, X1 and X2.

The corresponding number of cores are 192 (S), 320 (M), 384 (L), 640 (X1) and 960 (X2).

The user maximum is 1024 cores on HEXAGON. This allows concurrent integration of e.g. 1 X2, or 1 L + 1 X1, or 2 M + 1 L, or 2 M + 2 S, or 5 S.

The M default produces a throughput of 3.1 sim-years per integration day.

3.2 Set up case

Change to case directory with

 cd $HOME/CESM2.0/cases/QSIC5_f09_f09_test_01 

Execute case-setup script with

 ./case.setup 

This will create build and job scripts under the case directory and also prepare the run-directory in /work/users/$USER/noresm/QSIC5_f09_f09_test_01/run.

3.3 Customize component namelist files

The case directory contains the four user namelist files user_nl_cam, user_nl_cice, user_nl_cpl and user_nl_docn, which can be customized e.g. to specify additional diagnostic output.

After changing the user namelist files, you can optionally execute

 ./preview_namelists  

This will update the _in-namelist files in the run-directory so one can review the updates. However, use of preview_namelists is not necessary as the namelists are updated on job submission.

3.4 Build model

In case directory, execute

 ./case.build 

This will build the model and perform other tasks in the run directory.

3.5 Run time options (run length, resubmission etc)

Length of the integration and similar can be customized in the file env_run.xml in the case directory.

One can either edit env_run.xml using an editor (e.g. vi) or from the command line using the xmlchange script available in the case directory (type "./xmlchange --help" for usage).

3.5.1 Initial versus continuation

At the beginning of the simulation the value of CONTINUE_RUN must be set to FALSE. If set to true then the model attempts to restart from restart conditions produced by the SAME simulation that must be present in the run directory.

If you want to continue a simulation then make sure that CONTINUE_RUN is set to TRUE.

3.5.2 Run length

The run length is specified by STOP_OPTION and STOP_N. The default is 5 days. Set STOP_OPTION to nyears and STOP_N to 1 to specify a run length of 1 year.


3.5.3 Automatic resubmission and short term archiving

If RESUBMIT is set to n>0 then the job is automatically resubmitted n times. By default, CONTINUE_RUN will automatically set to TRUE during resubmission.

After each integration, diagnostic output and restart information is moved from the run-directory to the short-term archiving location /work/users/$USER/archive/QSIC5_f09_f09_test_01

Note that the /work/users disk area is subject to automatic deletion (see https://docs.hpc.uib.no/wiki/Data_Handling_and_Storage_Policy#Scratch_area:_.2Fwork ). Once the simulation is completed it is therefore recommended to move the output to a different location.

3.6 Integration time and job submission

3.6.1 Setting integration time

The value of JOB_WALLCLOCK_TIME in env_batch.xml specifies the maximum integration time.

The default for is set to 4 days on HEXAGON. If the machine load is high then a specification of a shorter wall-clock time will result in shorter queuing time.

The specified time corresponds to the limit for a single resubmission. For example, if the model resubmits itself after each simulation year then choose a wall-clock time sufficient to run one simulation year.

3.6.2 Job submission

To submit the job execute

 ./case.submit 

Once the job starts to run, the model will write log files and output in the run directory

 /work/users/$USER/noresm/QSIC5_f09_f09_test_01/run 

To check the queuing status type

 squeue -u $USER 

To cancel a job use

 scancel <job id> 

where <job id> is obtained from squeue.

More information on the queuing system is found at https://docs.hpc.uib.no/wiki/Job_execution_(Hexagon)