MPI
In order to facilitate code development, TCM supports a number of different MPI systems. However, as there is no standard concerning the detailed implementation of MPI any attempt to mix and match causes failure. The include files at compile time, the libraries at link time, and the mpirun command used at run time, must all match.
In order to cut down the number of combinations, we do not support the use of MPI between different machines. MPI in TCM is intended for use on multicore computers, or dedicated clusters, only.
This page describes MPI as found on TCM's desktops and s series machines. There is a separate page of brief benchmarks.
Compiler Support and MPI versions
Currently TCM has versions of both OpenMPI and MPICH. It is important to use the same version at compile, link and run time. In general minor versions can be mixed (the last digit of the version number), but mixing OpenMPI 1.6.x and 1.8.y, for instance, does not seem to be possible.
Because both OpenMPI and MPICH like to provide wrappers for
compilers, one needs to have available the correct wrapper for
the compiler one wishes to use. The names of the compilers are
mpifort
, mpifx
, mpgfortran
,
mpflang-amd
, mpnvfortran
and
mpnagfor
for Fortran, mpicc
,
mpicx
, mpgcc
, mpclang
and
mpnvc
for C, and mpicpc
, mpicpx
,
mpg++
, mpclang
and mpnvc++
for C++. It is not necessary to specify -lmpi
on the command line - all required MPI libraries should be linked
automatically. Shared MPI libraries will not be used by default. Not
all compilers are installed for all MPI versions.
Setting TCM_INTEL_VER to set the Intel compiler version works as usual.
Availability
This changes almost continuously. As of March 2024 we have (amongst others):
OpenMPI-5.0.2: Intel and Gnu compilers
OpenMPI-4.1.4: Intel, Gnu, NAG, AMD and Nvidia compilers
OpenMPI-4.0.5: Intel, Gnu and NAG compilers
OpenMPI-3.1.6: Intel, Gnu and NAG compilers
Intel-2021.1.1: Intel and Gnu compilers
MPICH-4.2.0: Intel and Gnu compilers
MPICH-4.1.3: Intel, Gnu and NAG compilers
MPICH-4.0.3: Intel, Gnu and NAG compilers
MPICH-3.2.1: Intel and Gnu compilers
Note that the mpi_f08
module is not available for the
NAG compiler except with MPICH 4.0 and 4.1 (but not 4.2).
The current situation can be determined by looking in the
directory /usr/local/shared/MPI
.
To switch between these, set the environment variable MPI
to one of the above, e.g.
export MPI=OpenMP-3.1.4 mpifort test.f90
MPICH-3.2 with the Gnu compilers seems to require specifying one
extra library: -lpthread
.
Running the Result
If a non-default version of MPI was used, one can
set the MPI environment variable appropriately, so that the
correct mpirun
or mpiexec
command is
used. Else mpirun
and mpiexec
attempt to
to choose the correct version automatically if MPI is not set.
Note that MPICH uses exclusively mpiexec
,
whereas OpenMPI supports both.
Documentation
There exist man pages, viewable with the mpiman
command,
which will display those pages relevant to the current setting of
the MPI environment variable.
Tutorials Etc.
I would recommend this course, but that is hardly surprising...
CMA
Does using Cross Memory Attach make your code go faster? In theory, yes. In practice, sometimes not. OpenMPI from 2.1.0 defaults to using it. To control it at runtime, use:
mpirun -mca btl_vader_single_copy_mechanism none ./a.out mpirun -mca btl_vader_single_copy_mechanism cma ./a.out
Example
The code used below is:
program version use mpi integer major,minor,nproc,rank,ierr call mpi_init(ierr) call mpi_comm_size(mpi_comm_world, nproc, ierr) call mpi_comm_rank(mpi_comm_world, rank, ierr) write(*,101) rank,nproc 101 format(' Hello F90 parallel world, I am process ', & I3,' out of ',I3) if (rank.eq.0) then call mpi_get_version(major,minor,ierr) write(*,102) major,minor 102 format(' Run time MPI version is ',I1,'.',I1) endif call mpi_finalize(ierr) end
And the result of trying to compile and run it is:
m1:/tmp$ mpifort version.f90 m1:/tmp$ mpirun -np 2 ./a.out Hello F90 parallel world, I am process 0 out of 2 Hello F90 parallel world, I am process 1 out of 2 Run time MPI version is 2.1 m1:/tmp$ MPI=OpenMPI-1.10.1 mpifort version.f90 m1:/tmp$ mpirun -np 2 ./a.out [m1:14717] [[3050,0],0] mca_oob_tcp_recv_handler: invalid message type: 15 [m1:14717] [[3050,0],0] mca_oob_tcp_recv_handler: invalid message type: 15 m1:/tmp$ MPI=OpenMPI-1.10.1 mpirun -np 2 ./a.out Hello F90 parallel world, I am process 0 out of 2 Run time MPI version is 3.0 Hello F90 parallel world, I am process 1 out of 2 m1:/tmp$ export MPI=MPICH-3.2.0 m1:/tmp$ mpifort version.f90 m1:/tmp$ mpiexec -n 3 ./a.out Hello F90 parallel world, I am process 1 out of 3 Hello F90 parallel world, I am process 0 out of 3 Run time MPI version is 3.1 Hello F90 parallel world, I am process 2 out of 3