Wednesday, May 15, 2024

Microsoft’s new custom chips to power AI workloads on Azure

Microsoft has unveiled two custom-designed chips and integrated systems: the Microsoft Azure Maia AI Accelerator, optimized for artificial intelligence (AI) tasks and generative AI, and the Microsoft Azure Cobalt CPU, an Arm-based processor tailored to run general-purpose compute workloads on the Microsoft Cloud.

The introduction of two new chips represents Microsoft’s first foray into semiconductors. This move follows in the footsteps of Microsoft’s public cloud rivals, namely Amazon’s AWS and Google Cloud, who run their own chips in their data centers alongside those provided by vendors such as Nvidia.

The new chips are both designed on Arm architecture, which is increasingly being deployed in cloud data centers as a more energy-efficient and cost-effective alternative to semiconductors built using Intel’s x86 process. Microsoft claims that Cobolt will offer a 40% performance boost over the existing Arm-based CPUs on Azure, which it launched last year in partnership with Ampere Computing.

The chips will start to roll out early next year to Microsoft’s data centers, initially powering the company’s services such as Microsoft Copilot or Azure OpenAI Service.

The company’s new Maia 100 AI Accelerator will power some of the largest internal AI workloads running on Microsoft Azure, such as the Microsoft Copilot AI assistant and the Azure OpenAI Service.

The Microsoft Azure Maia 100 AI Accelerator.
The Microsoft Azure Maia 100 AI Accelerator. Credit: Microsoft

The Maia 100 AI Accelerator was also designed specifically for the Azure hardware stack. The alignment of chip design with the larger AI infrastructure designed with Microsoft’s workloads in mind – can yield huge gains in performance and efficiency, said Brian Harry, a Microsoft technical fellow leading the Azure Maia team.

“Azure Maia was specifically designed for AI and for achieving the absolute maximum utilization of the hardware,” he said.

Meanwhile, the Cobalt 100 CPU is built on Arm architecture, a type of energy-efficient chip design, and optimized to deliver greater efficiency and performance in cloud-native offerings. Choosing Arm technology was a key element in Microsoft’s sustainability goal. It aims to optimize “performance per watt” throughout its data centers, which essentially means getting more computing power for each unit of energy consumed.

“The architecture and implementation are designed with power efficiency in mind,” Harry said. “We’re making the most efficient use of the transistors on the silicon. Multiply those efficiency gains in servers across all our data centers; it adds up to a pretty big number.”

The Microsoft Azure Cobalt 100 CPU.
The Microsoft Azure Cobalt 100 CPU. Credit: Microsoft

Microsoft has introduced a new silicon architecture that enables the company to improve cooling efficiency, optimize the use of its current data center assets, and maximize server capacity within its existing footprint.

The unique requirements of the Maia 100 server boards necessitated the creation of new, wider racks to accommodate them. These expanded racks provide enough space for both power and networking cables, which are essential for the demands of AI workloads.

The high-performance chips used for AI tasks require intensive computational demands, which consume more power than traditional air-cooling methods can handle. As a result, liquid cooling has emerged as the preferred solution to these thermal challenges, using circulating fluids to dissipate heat and ensure efficient functioning without overheating.

A custom-built rack for the Maia 100 AI Accelerator and its “sidekick” inside a thermal chamber at a Microsoft lab in Redmond, Washington.
A custom-built rack for the Maia 100 AI Accelerator and its “sidekick” inside a thermal chamber at a Microsoft lab in Redmond, Washington. Credit: John Brecher for Microsoft.

However, Microsoft’s current data centers weren’t designed for large liquid chillers. To address this, the company developed a “sidekick” that sits next to the Maia 100 rack. These sidekicks function similarly to radiators in cars, with cold liquid flowing from them to cold plates attached to the surface of the Maia 100 chips. Each plate has channels through which liquid is circulated to absorb and transport heat, which then flows back to the sidekick to remove heat from the liquid and send it back to the rack to continue absorbing more heat.

The tandem design of rack and sidekick underscores the value of a systems approach to infrastructure, said Wes McCullough, corporate vice president of hardware product development. By controlling every facet, Microsoft can orchestrate a harmonious interplay between each component, ensuring that the whole is indeed greater than the sum of its parts in reducing environmental impact.

“Microsoft is building the infrastructure to support AI innovation, and we are reimagining every aspect of our data centers to meet the needs of our customers,” said Scott Guthrie, executive vice president of Microsoft’s Cloud + AI Group. “At the scale we operate, it’s important for us to optimize and integrate every layer of the infrastructure stack to maximize performance, diversify our supply chain, and give customers infrastructure choice.”