Saturday, February 24, 2024

NVIDIA’s new AI supercomputer offers massive shared memory space

Cloud Service Providers (CSPs), hyperscalers, large research institutes, and other leading businesses that push the limits of AI require more memory than the limits of a single workload GPU or even large, multi-GPU systems. These users need a new way to scale the memory and processing power of hundreds of GPUs and CPUs with no performance congestion and scale while preserving the single-GPU programming model for simplicity.

NVIDIA has announced DGX supercomputers, a new class of large memory AI supercomputers powered by the NVIDIA GH200 Grace Hopper super chip and the NVIDIA NVLink switch system.

The NVIDIA DGX GH200 is the first supercomputer to pair Grace Hopper Superchips with the NVIDIA NVLink switch system, which allows up to 256 GPUs to be combined into a data-center-sized GPU. It provides one exaflop of FP8 AI performance and 144TB of shared memory, giving developers nearly 500 times faster-access memory to build larger models. This architecture offers 48 times more bandwidth than the previous generation, combining the power of a large AI supercomputer with the simplicity of programming a single GPU.

Each NVIDIA Grace Hopper Superchip in the NVIDIA DGX GH200 has 480 GB of LPDDR5 CPU memory, an eighth of the power per GB. The NVIDIA Grace CPU and Hopper GPU are interconnected with NVLink-C2C, which provides seven times more bandwidth than PCIe Gen5.

The NVLink Switch System is powered by the fourth generation of NVLink technology that extends NVLink connections to superchips to create a seamless, high-bandwidth, multi-GPU system.

The NVLink Switch System creates a two-layer, non-blocking, fat-tree NVLink fabric for fully connecting up to 256 Grace Hopper Superchips in a DGX GH200 system. Each GPU in the DGX GH200 can access the memory of other GPUs and the extended GPU memory of all NVIDIA Grace CPUs at 900 GBps GPU-to-GPU bandwidth.

To explore DGX GH200 capabilities for generative AI workloads, Google Cloud, Meta, and Microsoft companies are expected to be the first to get access. NVIDIA intends to provide cloud service providers and other hyperscalers with the DGX GH200 design as a blueprint so they can further customize it for their infrastructure.

“Building advanced generative models requires innovative approaches to AI infrastructure,” said Mark Lohmeyer, vice president of Compute at Google Cloud. “The new NVLink scale and shared memory of Grace Hopper Superchips address key bottlenecks in large-scale AI, and we look forward to exploring its capabilities for Google Cloud and our generative AI initiatives.”

The DGX GH200 includes NVIDIA software such as NVIDIA AI Enterprise and NVIDIA Base Command.

The supercomputer also includes white-glove services spanning installation and infrastructure management to expert advice on optimizing workloads.

NVIDIA DGX GH200 supercomputer could be a leap toward the future of supercomputers. This supercomputer will remove most of the limitations set by previous computing chips and is expected to be available by the end of the year.