Benchmarks
The following benchmarks should be taken with a pinch of salt: compilers differ, and latency measurements depend on everything from the phase of the moon onwards. However, for what they are worth:
Make | CPU | Speed | Memory | Streams | Latency | Linpack |
---|---|---|---|---|---|---|
Intel | P MMX | 233MHz | 2x??ns EDO DRAM | 150MB/s | 340ns | 70MFLOPS[2] |
DEC 3000/600 | 21064 | 175MHz | ?? DRAM | 100MB/s | 600ns | 88MFLOPS[2] |
Intel | P II | 350MHz | 100MHz SDRAM | 300MB/s | 300ns | 235MFLOPS[2] |
Indigo2 | R10000 | 195MHz | 120MB/s | 900ns | 240MFLOPS[2] | |
AS600/5/266 | 21164 | 266MHz | 8x60ns DRAM | 150MB/s | 450ns | 310MFLOPS[2] |
Intel | Celeron | 533MHz | 100MHz? SDRAM | 290MB/s | 300ns | 310MFLOPS[2] |
SB100 | US IIe | 500MHz | 100MHz SDRAM | 220MB/s | 360ns | 320MFLOPS[2] |
Octane | R12000 | 300MHz | 320MB/s | 540ns | 470MFLOPS | |
PW500au | 21164A | 500MHz | 2x83MHz SDRAM | 230MB/s | 280ns | 630MFLOPS |
Intel | P III | 1GHz | 133MHz SDRAM | 400MB/s | 170ns | 680MFLOPS |
XP900 | 21264 | 463MHz | 2x77MHz SDRAM | 770MB/s | 300ns | 725MFLOPS |
Octane | R12000 | 2x300MHz | 450MB/s | 910MFLOPS | ||
XP1000 | 21264A | 667MHz | 4x83MHz SDRAM | 1000MB/s | 280ns | 1060MFLOPS |
V240 | US IIIi | 1GHz | 2x222MHz DDR | 980MB/s | 135ns | 1180MFLOPS |
Intel | Pentium M | 1.7GHz | 1000MB/s | 140ns | 1360MFLOPS | |
V480 | US III Cu | 900MHz | 4x75MHz SDRAM | 1100MB/s | 210ns | 1430MFLOPS |
280R | US III Cu | 1.2GHz | 4x75MHz SDRAM | 1200MB/s | 150ns | 1730MFLOPS |
Intel | P 4 | 1.5GHz | 2xPC800 RDRAM | 2100MB/s | 210ns | 2020MFLOPS |
Intel | P 4 | 1.8GHz | 266MHz DDR | 1600MB/s | 200ns | 2340MFLOPS |
V480 | US III Cu | 2x900MHz | 4x75MHz SDRAM | 2000MB/s | 2780MFLOPS | |
Intel | Itanium 2 | 900MHz | 2x266MHz DDR | 1600MB/s | 210ns | 2900MFLOPS |
AMD | Turion64 | 1.8GHz | 333MHz DDR | 2000MB/s | 110ns | 3000MFLOPS |
Intel | P 4 | 2.4GHz | 2xPC800 RDRAM | 2200MB/s | 170ns | 3200MFLOPS |
Intel | P 4 | 2.4GHz | 2x266MHz DDR | 2600MB/s | 180ns | 3100MFLOPS |
280R | US III Cu | 2x1.2GHz | 4x75MHz SDRAM | 1600MB/s | 3360MFLOPS | |
Intel | P 4 | 2.67GHz | 2xPC1066 RDRAM | 3000MB/s | 140ns | 3700MFLOPS |
Intel | P 4 | 2.8GHz | 2x400MHz DDR | 4200MB/s | 110ns | 3900MFLOPS |
V40z | Opteron | 2.4GHz | 8x400MHz DDR | 3950MB/s | 120ns | 3970MFLOPS |
Intel | P D (1 core) | 3.0GHz | 2x533MHz DDR2 | 4600MB/s | 135ns | 4900MFLOPS |
Intel | P 4 EM64T | 3.4GHz | 2x533MHz DDR2 | 4500MB/s | 130ns | 5000MFLOPS |
Intel | P 4 | 2x2.4GHz | 2x266MHz DDR | 2600MB/s | 5300MFLOPS | |
V480 | US III Cu | 4x900MHz | 4x75MHz SDRAM | 5350MFLOPS | ||
Intel | Itanium 2 | 2x900MHz | 2x266MHz DDR | 5550MFLOPS | ||
Intel | P4 EM64T Xeon | 3.6GHz | 2x400MHz DDR2 | 3900MB/s | 140ns | 5600MFLOPS |
Intel | P 4 | 2x2.4GHz | 2x266MHz DDR | 5800MFLOPS[10] | ||
V480 | US III Cu | 4x900MHz | 4x75MHz SDRAM | 2900MB/s | 5830MFLOPS[10] | |
Intel | Itanium 2 | 2x900MHz | 2x266MHz DDR | 5850MFLOPS[10] | ||
Dell R710 | Nehalem | 2.4GHz | 6x800MHz DDR3 | 9500MB/s | 120ns | 6500MFLOPS |
AMD | Ryzen (Zen) | 3.2GHz | 2x2400MHz DDR4 | 25,300MB/s | 100ns | 6580MFLOPS[10,MKL] |
Dell 2970 | Opteron | 2.5GHz | 4x800MHz DDR2 | 7700MB/s | 100ns | 7600MFLOPS |
V40z | Opteron | 2x2.4GHz | 8x400MHz DDR | 7900MB/s | 7900MFLOPS[10] | |
Intel | Core2 | 2.4GHz | 2x667MHz DDR2 | 5000MB/s | 95ns | 8200MFLOPS |
AMD | Athlon II | 3.0GHz | 2x1333MHz DDR3 | 8500MB/s | 130ns | 9700MFLOPS[10] |
Intel | P D (2 cores) | (2x)3.0GHz | 2x533MHz DDR2 | 4500MB/s | 10,000MFLOPS[10] | |
5000P | Xeon 5160 | 3.0GHz | 8x667MHz DDR2 | 4000MB/s | 120ns | 10,000MFLOPS[10] |
Intel | P4 EM64T Xeon | 2x3.6GHz | 2x400MHz DDR2 | 3450MB/s | 10,700MFLOPS[10] | |
Intel | Nehalem QC | 2.8GHz | 3x1066MHz DDR3 | 12,200MB/s | 95ns | 10,900MFLOPS[10] |
V40z | Opteron | 4x2.4GHz | 8x400MHz DDR | 15,500MB/s | 13,500MFLOPS[10] | |
Intel | Core2 DC | (2x)2.4GHz | 2x667MHz DDR2 | 5000MB/s | 15,300MFLOPS[10] | |
AMD | Athlon II | 2x3.0GHz | 2x1333MHz DDR3 | 12,700MB/s | 19,300MFLOPS[10] | |
Intel | Sandybridge QC | 3.2GHz | 2x1333MHz DDR3 | 17,500MB/s | 85ns | 22,700MFLOPS[10] |
AMD | Ryzen (Zen) | 3.2GHz | 2x2400MHz DDR4 | 25,300MB/s | 100ns | 24,400MFLOPS[10,OpenBLAS] |
Intel | Core2 QC | (4x)2.4GHz | 2x1066MHz DDR2 | 5600MB/s | 31,800MFLOPS[10] | |
5000P | Xeon 5160 | 2x(2x)3.0GHz | 8x667MHz DDR2 | 6400MB/s | 34,500MFLOPS[10] | |
AMD | Ryzen (Zen) QC | (4x)3.2GHz | 2x2400MHz DDR4 | 15,900MB/s | 35,900MFLOPS[10,MKL] | |
Intel | Haswell QC | 3.1GHz | 2x1600MHz DDR3 | 20,000MB/s | 76ns | 39,500MFLOPS[10] |
Intel | Nehalem QC | (4x)2.8GHz | 3x1066MHz DDR3 | 19,200MB/s | 42,000MFLOPS[10] | |
AMD | EPYC (Zen 3) 16C | 3.0GHz | 8x3200MHz DDR4 | 43,000MB/s | 125ns | 43,000MFLOPS[10, OpenBLAS] |
Intel | Kaby Lake QC | 3.0GHz | 2x2400MHz DDR4 | 27,000MB/s | 73ns | 44,000MFLOPS[10] |
Dell 2970 | Opteron | 2x(4x)2.5GHz | 4x800MHz DDR2 | 20,800MB/s | 45,000MFLOPS[10] | |
AMD | Ryzen (Zen 2) HC | 3.6GHz | 2x2666MHz DDR4 | 32,400MB/s | 112ns | 49,500MFLOPS[10,OpenBLAS] |
Intel | Skylake QC | 3.5GHz | 2x2133MHz DDR4 | 26,500MB/s | 77ns | 50,000MFLOPS[10] |
AMD | Ryzen (Zen 3) HC | 3.7GHz | 2x3200MHz DDR4 | 41,000MB/s | 97ns | 52,800MFLOPS[10,OpenBLAS] |
Intel | Kaby Lake QC | 3.7GHz | 2x2400MHz DDR4 | 29,000MB/s | 87ns | 53,200MFLOPS[10] |
AMD | Ryzen (Zen 4) HC | 3.8GHz | 2x4800MHz DDR5 | 56,300MB/s | 116ns | 54,100MFLOPS[10,OpenBLAS] |
Intel | Xeon Gold 12C | 2.6GHz | 12x2666MHz DDR4 | 10,800MB/s | 115ns | 65,400MFLOPS[10] |
Dell R710 | Nehalem | 2x(4x)2.4GHz | 6x800MHz DDR3 | 26,000MB/s | 68,500MFLOPS[10] | |
Sun X2270 | Nehalem | 2x(4x)2.66GHz | 6x1066MHz DDR3 | 25,000MB/s | 78,600MFLOPS[20] | |
Intel | Xeon W 2145 | 3.7GHz | 4x2666MHz DDR4 | 18,900MB/s | 115ns | 82,600MFLOPS[10] |
Intel | Sandybridge QC | (4x)3.2GHz | 2x1333MHz DDR3 | 18,500MB/s | 83,100MFLOPS[10] | |
AMD | Ryzen (Zen) QC | (4x)3.2GHz | 2x2400MHz DDR4 | 31,700MB/s | 88,000MFLOPS[10,OpenBLAS] | |
Intel | Ivybridge QC | (4x)3.3GHz | 2x1600MHz DDR3 | 21,500MB/s | 93,800MFLOPS[10] | |
Intel | Ivybridge HC | (6x)3.5GHz | 4x1600MHz DDR3 | 42,700MB/s | 117,000MFLOPS[10] | |
Intel | Haswell QC | (4x)3.1GHz | 2x1600MHz DDR3 | 21,500MB/s | 135,000MFLOPS[10] | |
Intel | Ivybridge HC | (6x)3.5GHz | 4x1600MHz DDR3 | 42,700MB/s | 146,000MFLOPS[20] | |
Intel | Haswell QC | (4x)3.1GHz | 2x1600MHz DDR3 | 21,500MB/s | 153,000MFLOPS[20] | |
Intel | Kaby Lake QC | (4x)3.0GHz | 2x2400MHz DDR4 | 28,400MB/s | 165,000MFLOPS[10] | |
Intel | Kaby Lake QC | (4x)3.7GHz | 2x2400MHz DDR4 | 31,000MB/s | 193,000MFLOPS[10] | |
AMD | Ryzen (Zen 2) HC | (6x)3.6GHz | 2x2666MHz DDR4 | 34,700MB/s | 267,000MFLOPS[10,OpenBLAS] | |
Intel | 2xSandybridge OC | 2x(8x)2.6GHz | 8x1600MHz DDR3 | 70,000MB/s | 280,000MFLOPS[20] | |
AMD | Ryzen (Zen 3) HC | (6x)3.7GHz | 2x3200MHz DDR4 | 40,000MB/s | 288,000MFLOPS[10,OpenBLAS] | |
AMD | Ryzen (Zen 4) HC | (6x)3.8GHz | 2x4800MHz DDR5 | 56,200MB/s | 303,000MFLOPS[10,OpenBLAS] | |
Intel | 2xHaswell 10C | 2x(10x)2.6GHz | 8x2133MHz DDR4 | 108,000MB/s | 112ns | 475,000MFLOPS[20] |
Intel | Xeon W 2145 | (8x)3.7GHz | 4x2666MHz DDR4 | 56,800MB/s | 530,000MFLOPS[20] | |
Intel | 2xHaswell 10C | 2x(10x)2.6GHz | 8x2133MHz DDR4 | 108,000MB/s | 595,000MFLOPS[40] | |
AMD | EPYC (Zen 3) 16C | (16x)3.0GHz | 8x3200MHz DDR4 | 146,000MB/s | 648,000MFLOPS[20, OpenBLAS] | |
AMD | EPYC (Zen 3) 16C | (16x)3.0GHz | 8x3200MHz DDR4 | 146,000MB/s | 680,000MFLOPS[40, OpenBLAS] | |
Intel | 2xXeon Gold 12C | 2x(12x)2.6GHz | 12x2666MHz DDR4 | 160,000MB/s | 1,150,000MFLOPS[40] | |
Intel | 2xXeon Gold 16C | 2x(16x)2.1GHz | 12x2666MHz DDR4 | 170,000MB/s | 1,220,000MFLOPS[40] | |
Intel | 2xXeon Gold 16C | 2x(16x)2.1GHz | 12x2666MHz DDR4 | 170,000MB/s | 1,250,000MFLOPS[60] | |
Intel | 2xXeon Gold 18C | 2x(18x)2.3GHz | 12x2666MHz DDR4 | 170,000MB/s | 1,310,000MFLOPS[40] | |
Intel | 2xXeon Gold 18C | 2x(18x)2.3GHz | 12x2666MHz DDR4 | 1,460,000MFLOPS[60] |
In all cases the streams and latency results are for main memory, and the linpack result is for 5000x5000, except for those marked [2], where it is 2000x2000, and those marked [10] where it is 10000x10000, etc.
Notes
All forms of hyperthreading and overclocking are disabled (save for the Zen 2 Ryzen which resisted).
Streams measures memory bandwidth out of cache. The test runs use around half of the machine's memory in most cases. The timer used on the earlier Intel platforms has a resolution of just 0.01s, so the error bars on some of these results are large. This benchmark is important for codes which do unit-stride processing of large arrays (Castep).
Latency measured for random access over a 64MB (later machines, 1GB) array, and includes any TLB misses, except on Suns where these can be turned off by using large pages. This benchmark is important for codes which jump randomly all over memory (M$ Office).
Linpack measures the time taken to solve a 5000x5000 system of linear equations. It is pretty much the highest performance one can expect for a real problem, which in this case is matrix diagonalisation. Efficiency should increase with size (for large sizes), but the 5000x5000 case requires over 200MB, so not all computers can cope.
Compilers are an important part of the speed equation, and are barely tested by the above. Linpack is just a library call, the latency test is unoptimisable, and streams is either trivial to optimise, or perhaps as hard as easy for those systems which support software data prefetching (most of the above). However, different compilers and libraries were used with different entries above, which does give an advantage to more recent entries.
The above is rather unfair to Alphas, as it does not show the different ages of the machines. The Intel PII/350 is newer than the PW500au, for instance. It also lacks modern Alphas (DS25/ES45) which use 1GHz+ CPUs and 125MHz memory. Modern SGIs are also surely faster than the examples above.
More balanced benchmark suites are produced by SPEC, most notably the CPU2000 benchmark.
Thanks to AAM for the Pentium M result.