Introduction
In this exercise we will run our simple R commands as a batch job. As with the interactive job in the previous exercise we use the sbatch
command. Full documentation on sbatch
can be found in the system’s man pages (man sbatch
) or its online documentation. The usual way to submit a batch job is to create a job script. The lines starting with “#SBATCH” are called directives and map to the sbatch arguments that could be used on the command line. Here is an example job script:-
#!/usr/bin/env bash
#
#SBATCH --job-name=training_batch
#SBATCH --partition=training.q
#SBATCH --ntasks=1
#SBATCH --time=1:00
echo $SLURM_JOB_NAME
echo "Current working directory is `pwd`"
echo "Starting run at: `date`"
module purge
module load system/intel64
module load R/3.6.3
srun R -f training/src/square.r
# output how and when job finished
echo "Program finished with exit code $? at: `date`"
# end of jobscript
The job will create an output file and an error file. These will be created in the working directory by default.
Exercise
Cut and paste the above example into a file called “batch-job.sh” in your $HOME . Submit the job:-
cd $HOME; sbatch batch-job.sh
Use the squeue
command to confirm the status of the job.
Look for and examine the output file and check its content. Try to modify your submission script to tell SLURM to write the output and errors (technically, the stdout & stderr streams) into separate files and to store it into the training/logs
folder.
Look in the sbatch
documentation to find out how to resubmit the job on hold. The job will appear in the queue with status PD and the reason (JobHeldUser).
Release the job (hint: scontrol
) and confirm it runs.