Wednesday, April 10, 2024

Microsoft built a powerful supercomputer to train OpenAI’s large-scale AI models

This Tuesday (19), during Build 2020, which is being carried out virtually, due to the pandemic of the new coronavirus, Microsoft revealed that it had nailed a supercomputer that will be one of the most powerful in the world. Microsoft did not reveal any specific performance for its machine but states that it will be one of the top five most powerful publicly disclosed supercomputers in the world.

It is developed in collaboration with and exclusively for OpenAI, a company co-founded by Elon Musk, in which Microsoft invested $1 billion last year. This supercomputer will be used specifically for testing OpenAI’s large-scale Artificial Intelligence models and is one of the first results of this investment.

This Azure-based supercomputer has a whopping (more than) 285,000 CPU cores and is powered by 10,000 GPUs, boasting network connectivity of 400 Gigabits per second for each GPU server. In addition, thanks to Azure, the supercomputer benefits from all the features of Microsoft’s Cloud infrastructure, including rapid deployment, sustainable datacenters, and Azure services.

The exciting thing about these models is the breadth of things they’re going to enable,” said Microsoft Chief Technical Officer Kevin Scott. “This is about being able to do a hundred exciting things in natural language processing at once and a hundred exciting things in computer vision, and when you start to see combinations of these perceptual domains, you’re going to have new applications that are hard to even imagine right now.

This represents the first step in advance towards the development of large Artificial Intelligence models. Also, the infrastructure needed to train them is available as a platform so that other organizations and developers can use it as the basis for their developments.

The partnership between Microsoft and OpenAI, coupled with a computer of this size, aims to change the way machine learning systems operate. Currently, several of these systems work inefficiently: the database used for training is used in a restricted way.

The idea is that, in the future, the algorithm’s training system can access and read billions of texts publicly available on the internet. This could allow a system to learn something new, based on a specific need and on-demand. After analyzing code bases in repositories like GitHub, systems could even start programming.

Microsoft already has its own family of large AI models, called Microsoft Turing. They are mainly used to improve understanding of natural language in Bing, Office, and Dynamics. The Turing model for natural language is considered the largest publicly available AI language model worldwide.