MPI & pingpong
A very simple, and traditional, benchmark for MPI is pinpong. Like all simple benchmarks, it has many flaws, such as testing very few operations (just point to point, with no other communication occuring between other nodes). However, it is usually better than nothing.
The below figures come from running on a 3.2GHz Haswell (Xeon E3-1225 v3).
OpenMPI-1.8.2 latency 0.18us, peak bandwidth 9,600MB/s at 512KB,
asymptotic bandwidth 3,900MB/s
OpenMPI-2.1.4: latency 0.17us, peak bandwidth 9,700MB/s at 256KB,
asymptotic bandwidth 4,100MB/s
OpenMPI-3.1.6: latency 0.25us, peak bandwidth 9,600MB/s at 512KB,
asymptotic bandwidth 4,100MB/s
MPICH-3.2.1: latency 0.20us, peak bandwidth 11,700MB/s at 512KB,
asymptotic bandwidth 4,200MB/s
Intel 2021.1: latency 0.42us, peak bandwidth 11,600MB/s at 256KB,
asymptotic bandwidth 3,700MB/s