Batch Jobs
These are jobs that are submitted to run in the background without requiring direct user interaction once they start. The user submits the job to a queue, and Slurm schedules the job to run when resources become available. This is ideal for large, non-interactive computational tasks, simulations, data analysis, etc.
Here, you write a job script (typically a shell script) that contains the commands to execute your task, along with Slurm directives to specify resource requirements (e.g., number of nodes, CPUs, memory, etc.). Once submitted, the job runs automatically without user interaction.
Submitting batch jobs
The command sbatch serves as the means to submit batch jobs to Slurm,
typically in the form of a shell script written in the bash command language.
Note
A shell script is essentially a text file containing instructions that are parsed and executed by a shell or command-line interpreter.
When submitting a job to Slurm, the shell script must adhere to the following requirements:
The first line of the script should specify the shell to be used. This is accomplished by including a shebang (
#!) followed by/bin/env, a space, and the name of the desired shell. For example, for a bash script:#!/bin/env bash. Alternatively, the path to the shell can be used directly:#!/bin/bash.The script should define a set of parameters in the form of comments (lines preceded by the
#symbol). These parameters are utilized by Slurm to establish job priority and resource requirements. Each parameter should begin with the wordSBATCHand should be placed on its own separate line. For instance:#!/bin/env bash #SBATCH --job-name=some_job_name #SBATCH --time=01:00
A list of common sbatch options is provided below for convenience:
Parameter |
Description |
Value |
Mandatory |
|---|---|---|---|
|
A user defined name assigned to the job |
Any combination of letters and numbers |
no |
|
Maximum time the job will be allowed to run |
Time given in format: DD-HH:MM:SS |
YES |
|
Amount of memory the job requires per node |
<number>[units] where units are K for kilobytes; M for megabytes; G for gigabytes or T for terabytes |
YES (for the |
|
Number of tasks of the job |
Number |
no (default 1) |
|
Number of tasks assigned to a node |
Number |
no (default 1) |
|
Number of nodes requested |
Number |
no (default 1) |
|
Number of threads per task |
Number |
no (default 1) |
|
Partition/queue where the job will be submited |
cpu384g, cpu1500g, cpu6000g, gpu1h100, gpu2h100 or hgxh200 |
YES |
|
File where the standard output will be written. You can use placeholders to control the file naming scheme (see https://slurm.schedmd.com/sbatch.html#SECTION_FILENAME-PATTERN) |
Any combination of letters and numbers |
no |
|
File where the standard error will be written. You can use placeholders to control the file naming scheme (see https://slurm.schedmd.com/sbatch.html#SECTION_FILENAME-PATTERN) |
Any combination of letters and numbers |
no |
|
The job cannot share nodes with other running jobs |
No values |
YES (for the |
|
Return an estimate of when a job would be run |
No values |
no |
For more comprehensive details, please consult Slurm’s sbatch manual.
Once the necessary parameters have been incorporated into the script,
users should proceed to configure the environment for the application they intend to execute.
If the job uses a software installed within the cluster,
users are encouraged to load the corresponding modulefiles within the script using
the module load <modulefile> command.
Alternatively, users can load their custom modulefiles or manually set
the required environmental variables.
Any additional tasks such as directory or file creation, among others,
should be performed at this stage. Note that the submission script
essentially functions as a shell script with additional comments at the beginning.
As such, any actions that a user would typically carry out in a regular shell session
can be executed within the script.
Finally, the script should encompass the core job tasks, which typically involve executing simulations, experiments, or other relevant operations. During this phase, users would typically initiate the necessary scientific software required for their specific needs.
Batch Job Example
Consider the following bash script:
#!/bin/bash
#SBATCH --job-name=example_batch_job
#SBATCH --error=/work/user/%x.%j.err
#SBATCH --output=/work/user/%x.%j.out
#SBATCH --time=05:30:00
#SBATCH --ntasks-per-node=32
#SBATCH --nodes=2
#SBATCH --partition=cpu384g
# print list of nodes assigned to the job.
# Example output:
# cpusm01
# cpusm02
scontrol show hostnames $SLURM_JOB_NODELIST
In this script, the job is assigned the name example_batch_job using the #SBATCH --job-name
directive. The maximum allowed running time is set to 5 hours and 30 minutes through the
#SBATCH --time directive. The job is configured to utilize 2 nodes (#SBATCH --nodes)
and 32 processors per node (#SBATCH --ntasks-per-node). It is intended to run in the cpu384g queue
(#SBATCH --partition). Any encountered error messages are to be stored in the file
/work/user/%x.%j.out, where %x is replaced with the job name specified with the
#SBATCH --job-name directive and %j is replaced with the job ID assigned by Slurm.
Similarly, non-error messages are directed to the file /work/user/%x.%j.err
for logging purposes. The environmental variable SLURM_JOB_NODELIST is passed
to the script by slurm (see section Slurm environmental variables).
Suppose this script is located at path: /work/user/example_batch_job.sh. Then,
the command below would submit the batch job to slurm:
sbatch /work/user/example_batch_job.sh
Job Arrays
Job arrays are a powerful feature in Slurm that allow you to submit and manage multiple similar jobs efficiently using a single job script. This is especially useful when you need to run the same program many times with different input parameters, files, or configurations.
Instead of writing and submitting dozens (or hundreds) of separate job scripts, you can use a single script and let Slurm handle the rest.
How Job Arrays Work
When you use the #SBATCH --array directive in your job script, Slurm creates multiple jobs, each with a unique index called SLURM_ARRAY_TASK_ID. This environment variable is automatically set by Slurm and can be used inside your script to control behavior for each job.
Each array index corresponds to a separate job instance. For example, #SBATCH --array=1-3
will submit three jobs, with SLURM_ARRAY_TASK_ID set to 1, 2, and 3 respectively. You can use this variable to select input files, set parameters, or control logic in your script.
Warning
Important: The order in which array jobs run is not guaranteed. For example, the job with SLURM_ARRAY_TASK_ID=2 may run before or after the one with ID 3.
Array Syntax Options
You can define array indices in several ways:
Description |
Example |
Example Outcome |
|
Explicit list |
Comma separated list of array IDs. |
#SBATCH --array=1,2,3,7,10
|
Runs 5 jobs with Array Task IDs 1, 2, 3, 7, 10. |
Range |
Interval of array IDs. Both endpoints are inclusive. |
#SBATCH --array=1-10
|
Runs 10 jobs with Array Task IDs 1, 2, 3, 4, …, 9, 10. |
Range with an offset to skip array indexes. |
Interval of array IDs. |
#SBATCH --array=1-10:2
|
Runs 5 jobs with Array Task IDs 1, 3, 5, 7, 9. |
Limit concurrent jobs |
Indicate how many arrays jobs are submitted to the queue at a given time. |
#SBATCH --array=0-49%20
|
Submit 50 jobs, with Array Task IDs 0, 1, 2, …, 48, 49, but only 20 run at a time. |
Example Job Array Script
#!/bin/bash
#SBATCH --partition=cpu384g
#SBATCH --array=1-10
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem=8G
#SBATCH --time=01:00:00
#SBATCH --output=job_%A_%a.out
#SBATCH --error=job_%A_%a.err
#SBATCH --job-name=array_example
module load my_program
# Use SLURM_ARRAY_TASK_ID to select input file
INPUT_FILE="input_${SLURM_ARRAY_TASK_ID}.dat"
# Run the program with the selected input
my_program --input $INPUT_FILE --output output_${SLURM_ARRAY_TASK_ID}.dat
Explanation:
#SBATCH --array=1-10: Submits 10 jobs, each with a unique array index from 1 to 10.#SBATCH --ntasks=1: Each job runs a single task (i.e., spawns one process).#SBATCH --cpus-per-task=2: Each job requests 2 CPU cores. Since –ntasks=1, this likely means the process will use 2 threads.#SBATCH --mem=8G: Each job requests 8 GB of RAM.#SBATCH --time=01:00:00: Each job can run for up to 1 hour.#SBATCH --output=job_%A_%a.outand#SBATCH --error=job_%A_%a.err%A: Replaced with the job ID of the array.%a: Replaced with the array index (i.e., the value of SLURM_ARRAY_TASK_ID).
Each job runs independently with its own input and output.
Resource Considerations:
Since these jobs are submitted to the compute partition, we can estimate how they will be scheduled:
CPU usage: 10 jobs * 1 task * 2 cores = 20 cores total. Each node in the cpu384g queue has 32 cores, so all jobs can likely run on a single node.
Memory usage: 10 jobs * 8 GB = 80 GB total. Each node in the cpu384g queue has ~384 GB of RAM, so memory is not a limiting factor.
This means the scheduler may place all 10 jobs on the same node, depending on availability and other jobs in the queue.