TCM
UoC crest

Benchmarks

The following benchmarks should be taken with a pinch of salt: compilers differ, and latency measurements depend on everything from the phase of the moon onwards. However, for what they are worth:

MakeCPUSpeedMemoryStreamsLatencyLinpack
IntelP MMX233MHz2x??ns EDO DRAM150MB/s340ns70MFLOPS[2]
DEC 3000/60021064175MHz?? DRAM100MB/s600ns88MFLOPS[2]
IntelP II350MHz100MHz SDRAM300MB/s300ns235MFLOPS[2]
Indigo2R10000195MHz 120MB/s900ns240MFLOPS[2]
AS600/5/26621164266MHz8x60ns DRAM150MB/s450ns310MFLOPS[2]
IntelCeleron533MHz100MHz? SDRAM290MB/s300ns310MFLOPS[2]
SB100US IIe500MHz100MHz SDRAM220MB/s360ns320MFLOPS[2]
OctaneR12000300MHz 320MB/s540ns470MFLOPS
PW500au21164A500MHz2x83MHz SDRAM230MB/s280ns630MFLOPS
IntelP III1GHz133MHz SDRAM400MB/s170ns680MFLOPS
XP90021264463MHz2x77MHz SDRAM770MB/s300ns725MFLOPS
OctaneR120002x300MHz 450MB/s 910MFLOPS
XP100021264A667MHz4x83MHz SDRAM1000MB/s280ns1060MFLOPS
V240US IIIi1GHz2x222MHz DDR980MB/s135ns1180MFLOPS
IntelPentium M1.7GHz 1000MB/s140ns1360MFLOPS
V480US III Cu900MHz4x75MHz SDRAM1100MB/s210ns1430MFLOPS
280RUS III Cu1.2GHz4x75MHz SDRAM1200MB/s150ns1730MFLOPS
IntelP 41.5GHz2xPC800 RDRAM2100MB/s210ns2020MFLOPS
IntelP 41.8GHz266MHz DDR1600MB/s200ns2340MFLOPS
V480US III Cu2x900MHz4x75MHz SDRAM2000MB/s 2780MFLOPS
IntelItanium 2900MHz2x266MHz DDR1600MB/s210ns2900MFLOPS
AMDTurion641.8GHz333MHz DDR2000MB/s110ns3000MFLOPS
IntelP 42.4GHz2xPC800 RDRAM2200MB/s170ns3200MFLOPS
IntelP 42.4GHz2x266MHz DDR2600MB/s180ns3100MFLOPS
280RUS III Cu2x1.2GHz4x75MHz SDRAM1600MB/s3360MFLOPS
IntelP 42.67GHz2xPC1066 RDRAM3000MB/s140ns3700MFLOPS
IntelP 42.8GHz2x400MHz DDR4200MB/s110ns3900MFLOPS
V40zOpteron2.4GHz8x400MHz DDR3950MB/s120ns3970MFLOPS
IntelP D (1 core)3.0GHz2x533MHz DDR24600MB/s135ns4900MFLOPS
IntelP 4 EM64T3.4GHz2x533MHz DDR24500MB/s130ns5000MFLOPS
IntelP 42x2.4GHz2x266MHz DDR2600MB/s 5300MFLOPS
V480US III Cu4x900MHz4x75MHz SDRAM 5350MFLOPS
IntelItanium 22x900MHz2x266MHz DDR5550MFLOPS
IntelP4 EM64T Xeon3.6GHz2x400MHz DDR23900MB/s140ns5600MFLOPS
IntelP 42x2.4GHz2x266MHz DDR 5800MFLOPS[10]
V480US III Cu4x900MHz4x75MHz SDRAM2900MB/s 5830MFLOPS[10]
IntelItanium 22x900MHz2x266MHz DDR5850MFLOPS[10]
Dell R710Nehalem2.4GHz6x800MHz DDR39500MB/s120ns6500MFLOPS
AMDRyzen (Zen)3.2GHz2x2400MHz DDR425,300MB/s100ns6580MFLOPS[10,MKL]
Dell 2970Opteron2.5GHz4x800MHz DDR27700MB/s100ns7600MFLOPS
V40zOpteron2x2.4GHz8x400MHz DDR7900MB/s7900MFLOPS[10]
IntelCore22.4GHz2x667MHz DDR25000MB/s95ns8200MFLOPS
AMDAthlon II3.0GHz2x1333MHz DDR38500MB/s130ns9700MFLOPS[10]
IntelP D (2 cores)(2x)3.0GHz2x533MHz DDR24500MB/s10,000MFLOPS[10]
5000PXeon 51603.0GHz8x667MHz DDR24000MB/s120ns10,000MFLOPS[10]
IntelP4 EM64T Xeon2x3.6GHz2x400MHz DDR23450MB/s 10,700MFLOPS[10]
IntelNehalem QC2.8GHz3x1066MHz DDR312,200MB/s95ns10,900MFLOPS[10]
V40zOpteron4x2.4GHz8x400MHz DDR15,500MB/s13,500MFLOPS[10]
IntelCore2 DC(2x)2.4GHz2x667MHz DDR25000MB/s15,300MFLOPS[10]
AMDAthlon II2x3.0GHz2x1333MHz DDR312,700MB/s19,300MFLOPS[10]
IntelSandybridge QC3.2GHz2x1333MHz DDR317,500MB/s85ns22,700MFLOPS[10]
AMDRyzen (Zen)3.2GHz2x2400MHz DDR425,300MB/s100ns24,400MFLOPS[10,OpenBLAS]
IntelCore2 QC(4x)2.4GHz2x1066MHz DDR25600MB/s31,800MFLOPS[10]
5000PXeon 51602x(2x)3.0GHz8x667MHz DDR26400MB/s34,500MFLOPS[10]
AMDRyzen (Zen) QC(4x)3.2GHz2x2400MHz DDR415,900MB/s35,900MFLOPS[10,MKL]
IntelHaswell QC3.1GHz2x1600MHz DDR320,000MB/s76ns39,500MFLOPS[10]
IntelNehalem QC(4x)2.8GHz3x1066MHz DDR319,200MB/s42,000MFLOPS[10]
AMDEPYC (Zen 3) 16C3.0GHz8x3200MHz DDR443,000MB/s125ns43,000MFLOPS[10, OpenBLAS]
IntelKaby Lake QC3.0GHz2x2400MHz DDR427,000MB/s73ns44,000MFLOPS[10]
Dell 2970Opteron2x(4x)2.5GHz4x800MHz DDR220,800MB/s45,000MFLOPS[10]
AMDRyzen (Zen 2) HC3.6GHz2x2666MHz DDR432,400MB/s112ns49,500MFLOPS[10,OpenBLAS]
IntelSkylake QC3.5GHz2x2133MHz DDR426,500MB/s77ns50,000MFLOPS[10]
AMDRyzen (Zen 3) HC3.7GHz2x3200MHz DDR441,000MB/s97ns52,800MFLOPS[10,OpenBLAS]
IntelKaby Lake QC3.7GHz2x2400MHz DDR429,000MB/s87ns53,200MFLOPS[10]
AMDRyzen (Zen 4) HC3.8GHz2x4800MHz DDR556,300MB/s116ns54,100MFLOPS[10,OpenBLAS]
IntelXeon Gold 12C2.6GHz12x2666MHz DDR410,800MB/s115ns65,400MFLOPS[10]
Dell R710Nehalem2x(4x)2.4GHz6x800MHz DDR326,000MB/s68,500MFLOPS[10]
Sun X2270Nehalem2x(4x)2.66GHz6x1066MHz DDR325,000MB/s78,600MFLOPS[20]
IntelXeon W 21453.7GHz4x2666MHz DDR418,900MB/s115ns82,600MFLOPS[10]
IntelSandybridge QC(4x)3.2GHz2x1333MHz DDR318,500MB/s83,100MFLOPS[10]
AMDRyzen (Zen) QC(4x)3.2GHz2x2400MHz DDR431,700MB/s88,000MFLOPS[10,OpenBLAS]
IntelIvybridge QC(4x)3.3GHz2x1600MHz DDR321,500MB/s93,800MFLOPS[10]
IntelIvybridge HC(6x)3.5GHz4x1600MHz DDR342,700MB/s117,000MFLOPS[10]
IntelHaswell QC(4x)3.1GHz2x1600MHz DDR321,500MB/s135,000MFLOPS[10]
IntelIvybridge HC(6x)3.5GHz4x1600MHz DDR342,700MB/s146,000MFLOPS[20]
IntelHaswell QC(4x)3.1GHz2x1600MHz DDR321,500MB/s153,000MFLOPS[20]
IntelKaby Lake QC(4x)3.0GHz2x2400MHz DDR428,400MB/s165,000MFLOPS[10]
IntelKaby Lake QC(4x)3.7GHz2x2400MHz DDR431,000MB/s193,000MFLOPS[10]
AMDRyzen (Zen 2) HC(6x)3.6GHz2x2666MHz DDR434,700MB/s267,000MFLOPS[10,OpenBLAS]
Intel2xSandybridge OC2x(8x)2.6GHz8x1600MHz DDR370,000MB/s280,000MFLOPS[20]
AMDRyzen (Zen 3) HC(6x)3.7GHz2x3200MHz DDR440,000MB/s288,000MFLOPS[10,OpenBLAS]
AMDRyzen (Zen 4) HC(6x)3.8GHz2x4800MHz DDR556,200MB/s303,000MFLOPS[10,OpenBLAS]
Intel2xHaswell 10C2x(10x)2.6GHz8x2133MHz DDR4108,000MB/s112ns475,000MFLOPS[20]
IntelXeon W 2145(8x)3.7GHz4x2666MHz DDR456,800MB/s530,000MFLOPS[20]
Intel2xHaswell 10C2x(10x)2.6GHz8x2133MHz DDR4108,000MB/s595,000MFLOPS[40]
AMDEPYC (Zen 3) 16C(16x)3.0GHz8x3200MHz DDR4146,000MB/s648,000MFLOPS[20, OpenBLAS]
AMDEPYC (Zen 3) 16C(16x)3.0GHz8x3200MHz DDR4146,000MB/s680,000MFLOPS[40, OpenBLAS]
Intel2xXeon Gold 12C2x(12x)2.6GHz12x2666MHz DDR4160,000MB/s1,150,000MFLOPS[40]
Intel2xXeon Gold 16C2x(16x)2.1GHz12x2666MHz DDR4170,000MB/s1,220,000MFLOPS[40]
Intel2xXeon Gold 16C2x(16x)2.1GHz12x2666MHz DDR4170,000MB/s1,250,000MFLOPS[60]
Intel2xXeon Gold 18C2x(18x)2.3GHz12x2666MHz DDR4170,000MB/s1,310,000MFLOPS[40]
Intel2xXeon Gold 18C2x(18x)2.3GHz12x2666MHz DDR41,460,000MFLOPS[60]

In all cases the streams and latency results are for main memory, and the linpack result is for 5000x5000, except for those marked [2], where it is 2000x2000, and those marked [10] where it is 10000x10000, etc.

Notes

All forms of hyperthreading and overclocking are disabled (save for the Zen 2 Ryzen which resisted).

Streams measures memory bandwidth out of cache. The test runs use around half of the machine's memory in most cases. The timer used on the earlier Intel platforms has a resolution of just 0.01s, so the error bars on some of these results are large. This benchmark is important for codes which do unit-stride processing of large arrays (Castep).

Latency measured for random access over a 64MB (later machines, 1GB) array, and includes any TLB misses, except on Suns where these can be turned off by using large pages. This benchmark is important for codes which jump randomly all over memory (M$ Office).

Linpack measures the time taken to solve a 5000x5000 system of linear equations. It is pretty much the highest performance one can expect for a real problem, which in this case is matrix diagonalisation. Efficiency should increase with size (for large sizes), but the 5000x5000 case requires over 200MB, so not all computers can cope.

Compilers are an important part of the speed equation, and are barely tested by the above. Linpack is just a library call, the latency test is unoptimisable, and streams is either trivial to optimise, or perhaps as hard as easy for those systems which support software data prefetching (most of the above). However, different compilers and libraries were used with different entries above, which does give an advantage to more recent entries.

The above is rather unfair to Alphas, as it does not show the different ages of the machines. The Intel PII/350 is newer than the PW500au, for instance. It also lacks modern Alphas (DS25/ES45) which use 1GHz+ CPUs and 125MHz memory. Modern SGIs are also surely faster than the examples above.

More balanced benchmark suites are produced by SPEC, most notably the CPU2000 benchmark.

Thanks to AAM for the Pentium M result.