← К разделам
HPC Supercomputing
🖥️ HPC Supercomputing Center (Exascale)
Модуль #319. Astana Nazarbayev University Computer Center + Almaty Институт ИИВТ. Reference: Frontier ORNL (1.7 ExaFlops Top500 #1 2024), Fugaku Japan (442 PFlops), LUMI Finland (380 PFlops). HPC for climate modeling + drug discovery + AI training + nuclear physics simulation. Architecture: ~10 000 nodes × CPU AMD EPYC 9954 96-core + GPU NVIDIA H100/B200 80GB HBM3 × 4-8 per node, InfiniBand HDR 200 Gbps interconnect, parallel storage Lustre 100 PB. Liquid cooling direct-to-chip + adiabatic outdoor chillers. ISO 27001 + TIA-942 Tier IV + Green Grid PUE <1.2 + СН РК 4.04-12.
1. Состав HPC 100 PFlops cluster
- Compute nodes — 2500 servers Dell PowerEdge XE9680 / HPE Cray EX235n + AMD EPYC 9954 96-core × 2 (192 cores/node);
- GPU acceleration — NVIDIA H100 80GB HBM3 × 4 per node = 10 000 GPUs total, 60 PFlops FP16 compute;
- Memory — 1.5 TB DDR5-4800 per node, 3.75 PB total;
- Storage — Lustre parallel filesystem 100 PB (50 PB usable raid-6) + flash tier 1 PB NVMe;
- Interconnect — InfiniBand NDR 400 Gbps fat-tree topology, Mellanox QM9700 switches;
- Cooling — direct-to-chip liquid cooling DLC 100% (water-cooled cold plate Coolant Distribution Unit CDU);
- Heat removal — outdoor adiabatic chiller Bell+Gossett 5 MW thermal capacity (PUE 1.05-1.15);
- Power — 5 MW IT + 1 MW cooling + 0.5 MW UPS losses = 6.5 MW total; ATS automatic transfer от 2x utility feeds + 4× 1.5 MW DG;
- UPS — Eaton 9395 1 MVA × 5 N+1 redundancy + 30-min battery + 5-min flywheel ride-through;
- Networking — 100 Gbps fiber backbone к Almaty Internet eXchange + dark fiber до universities;
- Security — TIA-942 Tier IV: biometric + mantrap + 24/7 NOC monitoring + cybersecurity Falcon Crowdstrike;
- Cleanroom — ISO 14644 cl.8 (data center grade, dust control), pressurised positive +25 Pa;
- Building — 5000 m² total, 2500 m² data hall raised-floor 1.5 m, ceiling 6 m hot aisle containment;
- Office + visitor centre + classroom для researchers + workshops;
- Green sustainability — PUE 1.10 target, 30% PV onsite + waste heat recovery to district heating;
- Compliance — ISO 27001 + SOC 2 Type II + Green Grid + LEED Platinum BD+C.
Упражнение 1 — Cooling technology
Упражнение 2 — FP16 peak (PFlops)
Упражнение 3 — Капекс 100 PFlops HPC
- Compute 2500 nodes EPYC + H100 GPUs × $25K avg = 30 млрд
- Interconnect InfiniBand NDR fat-tree + switches Mellanox = 6 млрд
- Storage Lustre 100 PB + flash tier = 8 млрд
- Liquid cooling DLC + CDU + chillers Bell+Gossett 5 MW = 6 млрд
- UPS Eaton 9395 5 MVA + flywheel + DG 6 MW = 5 млрд
- Building 5000 м² + raised floor + structural reinforce = 5 млрд
- Security TIA-942 Tier IV + cybersecurity + NOC = 2 млрд
- Networking + IT + проект 5% + commissioning + insurance = 3 млрд
Упражнение 4 — Power efficiency
Нормативная база
- TIA-942 — Data Center Standards
- Green Grid PUE
- ASHRAE TC9.9 — Mission Critical Facilities
- Open Compute Project OCP
- ISO 27001 + SOC 2 Type II
- LEED Platinum BD+C
- Top500 Standards
- СН РК 4.04-12 — Связь и информатика