TCM
UoC crest

Processors in TCM

TCM has computers using processors of many different generations. Herewith a chronological summary of their features for those who want more detail than that given by status and rbusy. We also maintain a page of crude benchmarks. Peak FLOPS/Hz and memory technology are not the only improvements between generations, and in general all code is expected to run faster on the newer generations.

Note that Intel's naming for consumer processors (Pentium, i3, i5, i7) does not readily distinguish between the different generations of CPU cores in use. For instance, the Core i3, i5 and i7 brands were introduced around 2009 with the Nehalem, but have also been used for all subsequent CPUs. A Kaby Lake based Core i3 will be considerably superior to the original Core i7.

NameIntroducedFeatures   Vector Length  FLOPS/HzMemory
(in doubles)(peak)(all have ECC)
Xeon Scalable Gold2018SSE3, AVX512, FMA8 32256 or 384 bit DDR4/2666
AMD Zen 42022SSE3, AVX512, FMA8 16128 bit DDR5/4800
AMD Zen 32020SSE3, AVX2, FMA4 16128 bit DDR4/3200
AMD Zen 22019SSE3, AVX2, FMA4 16128 bit DDR4/2666
Skylake, Kaby Lake2015SSE3, AVX2, FMA4 16128 bit DDR4/2133 to 2400
E5 v32014SSE3, AVX2, FMA4 16256 bit DDR4/2133
Haswell2013SSE3, AVX2, FMA4 16128 bit DDR3/1600
AMD Zen2017SSE3, AVX2, FMA4 8128 bit DDR4/2400
E5, E5 v22012SSE3, AVX4 8256 bit DDR3/1600
Sandy Bridge, Ivy Bridge2011SSE3, AVX4 8128 bit DDR3/1333 to 1600
Nehalem2009SSE32 4192 bit DDR3/1066
AMD Athlon II2009SSE32 4128 bit DDR3/1333
Core22006SSE32 4128 bit DDR2/667 to 1066

In general programs are compiled to run on any of the above (all are 64 bit!). Some maths libraries, including OpenBLAS and Intel's MKL, will execute different instructions depending on which processor they are running on. If this is not done, then code which will run on any of the above processors will never use any four-element vector instructions, will never use any Fused Multiply Add instructions, and cannot achieve more than four double precision floating point operations per clock-cycle.

AVX2 adds integer instructions which operate on 256-bit vectors. The first generation of AVX was mostly floating-point only.

Most of our desktops have four cores, some very old ones just two, and some recent ones six.

The Core2 uses a separate memory controller which reduces its performance, especially as the link between the memory controller and the CPU is generally too slow. A Core2 with a 128 bit 667MT/s memory bus (10.6GB/s) usually has a 64 bit 1066MT/s bus (8.5GB/s) back to the CPU, and those with a 128 bit 1066MT/s memory bus (17GB/s) are throttled by the next bus being 64 bits and 1333MT/s (10.6GB/s). The Core2 also shares its CPU to memory controller bus with all I/O, including disk and video activity. All the other CPUs in TCM have memory controllers integrated onto the CPU.

The E5 v1 and v2 correspond to Sandy and Ivy Bridges, but with twice the memory bus width, and the E5 v3 is a Haswell with twice the memory bus width and DDR4 support.

AMD's Zen architecture supports 256-bit vector instructions, but its functional units are only 128 bits wide, so a 256-bit instruction takes two clock cycles to issue. The Zen 2 and Zen 3 have full-width functional units. The Zen 4 supports 512-bit vector instructions, but its functional units are only 256 bits wide.