# Supercomputer at CCS: Cygnus ## Multi-Hybrid Accelerated Computing Platform ### Combining goodness of different type of accelerators: GPU + FPGA - GPU is still an essential accelerator for simple and large degree of parallelism to provide ~10 TFLOPS peak performance - FPGA is a new type of accelerator for application-specific hardware with programmability and speeded up based on pipelining of calculation • FPGA is good for external communication between them with advanced high speed interconnection up to 100Gbps x4 chan. #### **Construction of "Cygnus"** Operation started in May 2019 2x Intel Xeon CPUs, 4x NVIDIA V100 GPUs, 2x Intel Stratix10 FPGAs • Deneb: 49 CPU+GPU nodes Albireo: 32 CPU+GPU+FPGA nodes with 2D-torus dedicated network for FPGAs (100Gbpsx4) Target FPGA: Nallatech 520N **Target GPU:** **NVIDIA Tesla V100** #### FPGA design plan - Router - For the dedicated network, this impl. is mandatory. - Forwarding packets to destinations - User Logic - OpenCL kernel runs here. - Inter-FPGA comm. can be controlled from # OpenCL kernel.SL3 - SerialLite III : Intel FPGA IP - Including transceiver modules for Inter-FPGA data transfer. - Users don't need to care 64FPGAs on Albireo nodes are connected directly as 2D-Torus configuration without Ethernet sw. Cygnus **Specification of Cygnus** | Item | Specification | | |------------------------|------------------------------------|---------------------------------------------------------------| | Peak<br>performa | PFLOPS SP) | PPS. CPU: 0.2 PFLOPS, FPGA: 0.6 mixed precision and variable | | #of nodes | 81 (32 Albireo (<br>(GPU-onlu) noc | GPU+FPGA) nodes, 49 Deneb<br>les) | | Memory | 192 GiB DDR4-<br>GPU/node = 3.6 | 2666/node = <b>256GB/s</b> , <b>32GiB</b> x 4 for STB/s | | CPU / nod | le Intel Xeon Gold | (SKL) x2 sockets | | GPU / nod | le NVIDIA V100 x4 | 4 (PCIe) | | FPGA / no | Intel Stratix10 a links/FPGA and | k2 (each with 100Gbps x4<br>l x8 links/node) | | Global File<br>System | e Luster, RAID6, | 2.5 PB | | Interconn<br>n Network | | | | Programn<br>Language | | ortran, OpneMP, GPU: OpenACC, OpenCL, Verilog HDL | | System V | endor NEC | | Deneb node (x48) Albireo node (x32) Ordinary inte-node communication channel for CPU and GPU, but they can also request it to FPGA Check it now! "Cygnus Movie"