POWER9
POWER9 is a family of superscalar, multithreading, symmetric multiprocessors based on the Power ISA announced in August 2016 at the Hot Chips conference.
The POWER9-based processors are being manufactured using a 14 nm FinFET process, in 12- and 24-core versions, for scale out and scale up applications, and possibly other variations, since the POWER9 architecture is open for licensing and modification by the OpenPOWER Foundation members.
The second fastest supercomputer in the world, Summit, is based on POWER9, while also using Nvidia Tesla GPUs as accelerators.
Design
Core
The POWER9 core comes in two variants, a four-way multithreaded one called SMT4 and an eight-way one called SMT8. The SMT4- and SMT8-cores are similar, in that they consist of a number of so-called slices fed by common schedulers. A slice is a rudimentary 64-bit single-threaded processing core with load store unit, integer unit and a vector scalar unit. A super-slice is the combination of two slices. An SMT4-core consists of a 32 KB L1 cache, a 32 KB L1 data cache, an instruction fetch unit and an instruction sequencing unit which feeds two super-slices. An SMT8-core has two sets of L1 caches and, IFUs and ISUs to feed four super-slices. The result is that the 12-core and 24-core versions of POWER9 each consist of the same number of slices and the same amount of L1 cache.A POWER9 core, whether SMT4 or SMT8, has a 12-stage pipeline, but aims to retain the clock frequency of around 4 GHz. It will be the first to incorporate elements of the Power ISA v.3.0 that was released in December 2015, including the VSX-3 instructions. The POWER9 design is made to be modular and used in more processor variants and used for licensing, on a different fabrication process than IBM's. On chip are co-processors for compression and cryptography, as well as a large low-latency eDRAM L3 cache.
Scale out / scale up
- IBM POWER9 SO scale-out variant, optimized for dual socket computers with up to 120 GB/s bandwidth to directly attached DDR4 memory
- IBM POWER9 SU scale-up variant, optimized for four sockets or more, for large NUMA machines with up to 230 GB/s bandwidth to buffered memory
I/O
A lot of facilities are on-chip for helping with massive off-chip I/O performance:- The SO variant has integrated DDR4 controllers for directly attached RAM, while the SU variant will use the off-chip Centaur architecture introduced with POWER8 to include high performance eDRAM L4 cache and memory controllers for DDR4 RAM.
- The Bluelink interconnects for close attachment of graphics co-processors from Nvidia and OpenCAPI accelerators.
- General purpose PCIe v.4 connections for attaching regular ASICs, FPGAs and other peripherals as well as CAPI 2.0 and CAPI 1.0 devices designed for POWER8.
- Multiprocessor links to connect other POWER9 processors in on the same motherboard, or in other closely attached enclosures.
Chip types
PowerNV | PowerVM | |
24 × SMT4 core | 12 × SMT8 core | |
Scale Out | Nimbus | unknown |
Scale Up | Cumulus |
Modules
The IBM Portal for OpenPOWER lists the three available modules for the Nimbus chip, although the Scale-Out SMT8 variant for PowerVM also uses the LaGrange module/socket:- Sforza – 50 mm × 50 mm, 4 DDR4, 48 PCIe lanes, 1 XBus 4B
- Monza – 68.5 mm × 68.5 mm, 8 DDR4, 34 PCIe lanes, 1 XBus 4B, 48 OpenCAPI lanes
- LaGrange – 68.5 mm × 68.5 mm, 8 DDR4, 42 PCIe lanes, 2 XBus 4B, 16 OpenCAPI lanes
Systems
Raptor Computing Systems / Raptor Engineering
Talos II – two-socket workstation/server platform using POWER9 SMT4 Sforza processors; available as 2U server, 4U server, tower, or EATX mainboard. Marketed as secure and owner-controllable with free and open-source software and firmware. Initially shipping with 4-core, 8-core, 18-core, and 22-core chip options until chips with more cores are available.Talos II Lite – single-socket version of the Talos II mainboard, made using the same PCB.
Blackbird – single-socket microATX platform using SMT4 Sforza processors, 4–22 cores, 2 RAM slots
Google–Rackspace partnership
Barreleye G2 / Zaius – two-socket server platform using LaGrange processors; both the Barreleye G2 and Zaius chassis use the Zaius POWER9 motherboardIBM
Power Systems AC922 – 2U, 2× POWER9 SMT4 Monza, with up to 6× Nvidia Volta GPUs, 2× CAPI 2.0 attached accelerators and 1 TB DDR4 RAM. AC here is an abbreviation for Accelerated Computing; this system is also known as "Witherspoon" or "Newell".Power Systems L922 – 2U, 1–2× POWER9 SMT8, 8–12 cores per processor, up to 4 TB DDR4 RAM, PowerVM running Linux.
Power Systems S914 – 4U, 1× POWER9 SMT8, 4–8 cores, up to 1 TB DDR4 RAM, PowerVM running AIX/IBM i/Linux.
Power Systems S922 – 2U, 1–2× POWER9 SMT8, 4–10 cores per processor, up to 4 TB DDR4 RAM, PowerVM running AIX/IBM i/Linux.
Power Systems S924 – 4U, 2× POWER9 SMT8, 8–12 cores per processor, up to 4 TB DDR4 RAM, PowerVM running AIX/IBM i/Linux.
Power Systems H922 – 2U, 1–2× POWER9 SMT8, 4–10 cores per processor, up to 4 TB DDR4 RAM, PowerVM running SAP HANA with AIX/IBM i on up to 25% of the system.
Power Systems H924 – 4U, 2× POWER9 SMT8, 8–12 cores per processor, up to 4 TB DDR4 RAM, PowerVM running SAP HANA with AIX/IBM i on up to 25% of the system.
Power Systems E950 – 4U, 2–4× POWER9 SMT8, 8–12 cores per processor, up to 16 TB buffered DDR4 RAM
Power Systems E980 – 1–4× 4U, 4–16× POWER9 SMT8, 8–12 cores per processor, up to 64 TB buffered DDR4 RAM
Penguin Computing
Magna PE2112GTX – 2U, two-socket server for high performance computing using LaGrange processors. Manufactured by Wistron.IBM Supercomputers
Summit and Sierra The United States Department of Energy together with Oak Ridge National Laboratory and Lawrence Livermore National Laboratory contracted IBM and Nvidia to build two supercomputers, the Summit and the Sierra, are based on POWER9 processors coupled with Nvidia's Volta GPUs. These systems are slated to go online in 2017. Sierra is based on IBM's Power Systems AC922 compute node.The first racks of Summit were delivered to Oak Ridge National Laboratory on 31 July 2017.
MareNostrum 4 – One of the three clusters in the emerging technologies block of the fourth MareNostrum supercomputer is a POWER9 cluster with Nvidia Volta GPUs. This cluster is expected to provide more than 1.5 petaflops of computing capacity when installed. The emerging technologies block of the MareNostrum 4 exists to test if new developments might be "suitable for future versions of MareNostrum".
Operating system support
As with its predecessor, POWER9 is supported by FreeBSD, IBM AIX, IBM i, Linux, and OpenBSD.Implementation of POWER9 support in the Linux kernel began with version 4.6 in March 2016.
RHEL, SUSE, Debian GNU/Linux, and CentOS are supported.