Introduction
In this example we will compile an MPI executable in an interactive shell and then submit a job to the batch queue.
Exercise
Create an interactive shell with 6 cores allocated by executing the following command :-
sinteractive -p training.q --ntasks=6
You should see something similar to :-
[train1@login8(sciama) ~]$ sinteractive -p training.q --ntasks=6 Waiting for JOBID 1691162 to start ...... [train1@node247(sciama) ~]$
All allocated cores are not necessary on the same node as the shell you are provided with now. Try to find out if/which other nodes are used as well (hint: squeue
).
Now select the modules you will need for this exercise :-
module purge; module load system/intel64 intel_comp/2019.2 openmpi/4.0.1
From the command line confirm that all the requested modules have been successfully loaded (hint: module list
).
Now change into the “$HOME/training/src
” directory and look at the file “mpi.c
” . Try to understand the structure of the code.
We will now compile this file and create an executable to be stored in the “bin” directory:-
mpicc mpi.c -o ../bin/mpi.exe
Ignore any warnings. Check that you now have an mpi.exe in the bin directory.
We will now run the program using all 6 cores we have allocated:-
cd $HOME/training/bin; srun --mpi=pmi2 mpi.exe
On slurm, srun should be used instead of mpirun that usually does the bootstrapping of the processes across the allocated nodes. Also notice that we haven’t told srun explicitly to use our 6 cores. If no number of tasks is specified, it uses all allocated cores automatically.
The output should be similar to :-
[train25@login8(sciama) bin]$ srun --mpi=pmi2 mpi.exe Hello world from process 0 of 6 Hello world from process 1 of 6 Hello world from process 2 of 6 Hello world from process 3 of 6 Hello world from process 4 of 6 Hello world from process 5 of 6
Now exit from the interactive shell back onto the login node. Check that you are in your home directory (cd
).
We will now submit the same program as a batch job. Again, using the knowledge from the previous exercises, try to write your own batch script to do so and store it in “training/scripts/
” (hint: if you get stuck, you can find a commented solution in “training/scripts/mpi-job.sh.solution
“). Then submit the job:-
sbatch training/scripts/mpi-job.sh
Use squeue
to check the status of the job. When the job has completed locate and examine the output file.
The contents of the output file should be similar to the interactive output.