Processors in TCM
TCM has computers using processors of many different
generations. Herewith a chronological summary of their features for those who want
more detail than that given by status
and rbusy
. We also maintain a page of
crude benchmarks. Peak FLOPS/Hz and
memory technology are not the only improvements between generations, and
in general all code is expected to run faster on the newer generations.
Note that Intel's naming for consumer processors (Pentium, i3, i5, i7) does not readily distinguish between the different generations of CPU cores in use. For instance, the Core i3, i5 and i7 brands were introduced around 2009 with the Nehalem, but have also been used for all subsequent CPUs. A Kaby Lake based Core i3 will be considerably superior to the original Core i7.
Name | Introduced | Features | Vector Length | FLOPS/Hz | Memory |
---|---|---|---|---|---|
(in doubles) | (peak) | (all have ECC) | |||
Xeon Scalable Gold | 2018 | SSE3, AVX512, FMA | 8 | 32 | 256 or 384 bit DDR4/2666 |
AMD Zen 4 | 2022 | SSE3, AVX512, FMA | 8 | 16 | 128 bit DDR5/4800 |
AMD Zen 3 | 2020 | SSE3, AVX2, FMA | 4 | 16 | 128 bit DDR4/3200 |
AMD Zen 2 | 2019 | SSE3, AVX2, FMA | 4 | 16 | 128 bit DDR4/2666 |
Skylake, Kaby Lake | 2015 | SSE3, AVX2, FMA | 4 | 16 | 128 bit DDR4/2133 to 2400 |
E5 v3 | 2014 | SSE3, AVX2, FMA | 4 | 16 | 256 bit DDR4/2133 |
Haswell | 2013 | SSE3, AVX2, FMA | 4 | 16 | 128 bit DDR3/1600 |
AMD Zen | 2017 | SSE3, AVX2, FMA | 4 | 8 | 128 bit DDR4/2400 |
E5, E5 v2 | 2012 | SSE3, AVX | 4 | 8 | 256 bit DDR3/1600 |
Sandy Bridge, Ivy Bridge | 2011 | SSE3, AVX | 4 | 8 | 128 bit DDR3/1333 to 1600 |
Nehalem | 2009 | SSE3 | 2 | 4 | 192 bit DDR3/1066 |
AMD Athlon II | 2009 | SSE3 | 2 | 4 | 128 bit DDR3/1333 |
Core2 | 2006 | SSE3 | 2 | 4 | 128 bit DDR2/667 to 1066 |
In general programs are compiled to run on any of the above (all are 64 bit!). Some maths libraries, including OpenBLAS and Intel's MKL, will execute different instructions depending on which processor they are running on. If this is not done, then code which will run on any of the above processors will never use any four-element vector instructions, will never use any Fused Multiply Add instructions, and cannot achieve more than four double precision floating point operations per clock-cycle.
AVX2 adds integer instructions which operate on 256-bit vectors. The first generation of AVX was mostly floating-point only.
Most of our desktops have four cores, some very old ones just two, and some recent ones six.
The Core2 uses a separate memory controller which reduces its performance, especially as the link between the memory controller and the CPU is generally too slow. A Core2 with a 128 bit 667MT/s memory bus (10.6GB/s) usually has a 64 bit 1066MT/s bus (8.5GB/s) back to the CPU, and those with a 128 bit 1066MT/s memory bus (17GB/s) are throttled by the next bus being 64 bits and 1333MT/s (10.6GB/s). The Core2 also shares its CPU to memory controller bus with all I/O, including disk and video activity. All the other CPUs in TCM have memory controllers integrated onto the CPU.
The E5 v1 and v2 correspond to Sandy and Ivy Bridges, but with twice the memory bus width, and the E5 v3 is a Haswell with twice the memory bus width and DDR4 support.
AMD's Zen architecture supports 256-bit vector instructions, but its functional units are only 128 bits wide, so a 256-bit instruction takes two clock cycles to issue. The Zen 2 and Zen 3 have full-width functional units. The Zen 4 supports 512-bit vector instructions, but its functional units are only 256 bits wide.