Skip to main content

The job scheduler : Slurm

slurm

The cluster workload manager is slurm.

The computations on the Cluster are carried out via a job scheduler which manages the queue and launches the computations when the requested resources are available.

caution

Reminder: it is strictly forbidden to calculate directly on the frontend node (zeus)!

Instead of directly executing their programs, users submit scripts to the Slurm job scheduler. Those scripts specify requested resources - what machines? for how long? etc - and commands to be executed on the allocated resources.

Therefore it is very important that requested resources match the requirement of the codes to be executed. As long as resources are unavailable the script won't start (it is useless to ask for too much resources) - and if allocated resources are insufficient the program fails to complete.

This page provides necessary information to write and submit job scripts adapted to your needs.

Maximum walltime maximum

Caution, the maximum allowed walltime depends on the number of cores requested:

  • below 48 cores, the limit is 384 hours (16 days)
  • for more than 48 cores the limit decreases proprtionally with the number of cores requested.

The amount of requested core-hours cannot be larger than 48 x 384 core-hours.

The table below presents some examples, the variation is linear.

#coresWalltime max (hours)Walltime max (days)
<=4838416
6428812
961928
1281446
19296
25672
384482
512361.5
76824 

Interactive jobs

To run short tests you may use the following interactive machines :

 NameType CPUMemory
 interactive-1CPU2x10 cœurs @2.20GHz, Intel Xeon Silver 4114192 Go
 interactive-2 CPU2x10 cœurs @2.20GHz, Intel Xeon Silver 4114192 Go

From zeus these machines are accessible directly via ssh.

The command

ssh interactive

will connect you to either of the two machines interactive-[1,2].

As for the frontend, these nodes are shared among all users.

Therefore, you should avoid launching long-running jobs on those machines and keep that fact in mind regarding performance measures.

For longer tests you must submit a job via the job scheduler.

Job submission (sbatch)

The following command submits a script called job.slurm to the job scheduler

 sbatch job.slurm

The job.slurm file contains instructions for Slurm as well as commands to execute on the granted resources. This command returns a unique identifier of the job (JOBID).

For example, the following script describes a job that reserves 1 node (--nodes=1) for a duration of at most 10 minutes and executes the commands hostname and sleep 60.

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --time=00:10:00

hostname
sleep 60

Slurm instructions (more precisely, sbatch options) start with #SBATCH followed by an option. The main sbatch options are listed below.

Same sample submission scripts for frequently used codes can be found in the /share/doc/submission/ folder. If you would like to see an example script added or spot a mistake, let us know (hpc@univ-lille.fr).

Some examples of submission scripts are available here.

--time, -t

The maximum duration of the job. Beyond this limit the job will be killed by the system.

Valid formats include :

  • #SBATCH --time 24:00:00 → 24 hours, the default value
  • #SBATCH --time 2-12 → 2 days and 12 hours
  • #SBATCH --time 30 → 30 minutes
  • #SBATCH --time 30:00 → also 30 minutes

The equivalent short syntax is -t 24:00:00.

Any job that request more than the maximum allowed walltime will remain in the waiting queue indefinitely.

--nodes, -N

The number of requested nodes. For example, to request 1 node (default value):

  • #SBATCH --nodes=1

To request at least 2 and at most 4 nodes:

  • #SBATCH --nodes=2-4

--ntasks-per-node

The number of cores requested per node (here 1 core, the default value). This option is to be used in conjunction with --nodes

  • #SBATCH --ntasks-per-node=1

--job_name, -J

The name of the job

  • #SBATCH --job-name=my_job

--mem

The maximum amount of memory required per node (here 1024 MB or 1 GB, which is default).

  • #SBATCH --mem=1024M

Slurm accepts other formats than M (default), for example G. So the following two options are equivalent

  • SBATCH --mem=2048 → 2048 MB
  • SBATCH --mem=2G → 2 GB → 2048 MB
caution

be aware of the difference between --mem=128G and --mem=128000! A job that requests --mem=128G (131072 MB) won't be able to run on the 128 GB nodes (see this) as those are configured with MEMORY=128000 in Slurm.

For further information on how to set this value appropriately see this

--mail_user

To receive email notifications concerning your job

#SBATCH --mail-type=ALL
#SBATCH --mail-user=you@univ-lille.fr

--gres

gres stands for consumable generic resource. Often gres is used to reserve GPUs (see this for further information). For instance,

  • #SBATCH --gres=gpu:2

requests two GPUs per node (without specifying the GPU type) The command sbatch --gres=help allows to list available gres options.

--output,--error

allow to specify the output files.

By default, stdout and stderr streams are redirected to a file named slurm-%j.out (where %j is the SLURM_JOB_ID) inside the launch directory.

It is also possible to create one output file per task or per node. The Slurm documentation explains how to create output filenames based on job parameters.