Introduction
Arrays
The best and recommended way to submit many jobs (>100) is using SLURM’s jobs array feature. The job arrays allow managing big number of jobs more effectively and faster.
To specify job array use --array
as follows:
Tell SLURM how many jobs you will have in array:
--array=0-9
. There are 10 jobs in array. The first job has index 0, and the last job has index 9.--array=5-8
. There are 4 jobs in array. The first job has index 5, and the last job has index 8.--array=2,4,6
. There are 3 jobs in array with indices 2, 4 and 6.
Now you can write a job submission script that looks like:
#!/usr/bin/env bash
#
#SBATCH --job-name=training_batch
#SBATCH --partition=training.q
#SBATCH --cpus-per-task=1
#SBATCH --time=1:00
#SBATCH --output=test_%A_%a.out
#SBATCH --array=1-3
echo $SLURM_JOB_NAME
echo $SLURM_ARRAY_TASK_ID
Dependencies
Often we develop pipelines where a particular job must be launched only after previous jobs were successully completed. SLURM provides a way to implement such pipelines with its --dependency
option:
--dependency=afterok:<job_id>
. Submitted job will be launched if and only if job withjob_id
identifier was successfully completed. Ifjob_id
is a job array, then all jobs in that job array must be successfully completed.--dependency=afternotok:<job_id>
. Submitted job will be launched if and only if job withjob_id
identifier failed. Ifjob_id
is a job array, then at least one job in that array failed. This option may be useful for cleanup step.--dependency=afterany:<job_id>
. Submitted job wil be launched after job withjob_id
identifier terminated i.e. completed successfully or failed.
Exercise
Copy and paste the above example into a file called “array-job.sh” in your $HOME. Submit the job:-
cd $HOME; sbatch array-job.sh
Look for and examine the output files to check the content. Modify the script to run 6 array jobs and check the status on the queue. (squeue -r or scontrol show job jobid)
The output should be similar to :-
Now submit the script to run 100 array jobs but with only 5 at a time.
sbatch --array [1-100]%5 array-job.sh
Check the status of your job.
If some of your jobs are still queued cancel the remainder. Check the output files to see how many of the array jobs ran.
scancel jobid
Finally, submit a batch job to run an array of 1-10 and submit another batch job of array 11-20 which will only run if the first array jobs complete successfully.
sbatch --array [1-10] array-job.sh
sbatch --dependency=afterok:jobid --array [11-20] array-job.sh
Check the status of your job on the queue. Did the first array of jobs work and has the second array run?