AI data center A look into the heart of Elon Musk's Tesla AI supercomputer Cortex

From Susanne Braun | Translated by AI 3 min Reading Time

Cortex is one of several AI training clusters that Tesla founder Elon Musk is having set up by his companies. On X, Musk shared a video from inside the Cortex cluster at Tesla headquarters, in which 50,000 Nvidia GPUs are installed in the first stage.

In June 2024, Musk claimed that Optimus would go into limited production in 2025, with plans for more than 1,000 units to be used in Tesla facilities and the possibility of production for other companies in 2026. Optimus is trained by the Tesla supercomputer.(Image: Tesla)
In June 2024, Musk claimed that Optimus would go into limited production in 2025, with plans for more than 1,000 units to be used in Tesla facilities and the possibility of production for other companies in 2026. Optimus is trained by the Tesla supercomputer.
(Image: Tesla)

Artificial intelligence, as a driver of the fourth industrial revolution, is said to be doing wonderful things today and especially in the future; at least if you believe the speeches of entrepreneurs who have invested a lot of money and time in AI. However, artificial intelligence is difficult to visualize. A few screens on which algorithms are executed or an uncanny-looking image from an image generator that is not so high-end, is that it?

The supposed costs of training and operating AI are also rarely put in the spotlight in reporting. The idea that an AI superchip should cost between 30,000 and 70,000 US dollars is already difficult, but still understandable for industry experts given the technology involved. So the idea of several tens of thousands of AI accelerators being operated simultaneously in server racks is not so difficult, is it? Or is it? You don't have to imagine it anymore, because Elon Musk has paid a visit to the Tesla AI training cluster Cortex and shared a video of it on his social platform X (formerly Twitter).

Gallery

Cortex supercluster in Austin

Cortex is described by Musk as an AI training supercluster and is currently set up on the grounds of Tesla's headquarters in Austin, Texas, to implement "real-world AI". Cortex will work on the Full Self Driving (FSD) autopilot system for Tesla and the system for the autonomous, humanoid Optimus robot. The latter is to be used in Tesla production. When completed, Cortex will link 70,000 GPUs together. For the first stage, 50,000 H100s from Nvidia will initially be installed, with 20,000 more chips developed by Tesla itself to be added later.

As a reminder, H100 is a GPU specially developed for AI and high-performance computing. The chip consists of 80 billion transistors and supports up to 700 GB/s memory bandwidth through HBM3 memory. The NVLink developed by Nvidia enables the connection to several other GPUs.

How loud is a supercomputer?

Back to Cortex. Musk's video from August 26, 2024 shows an interim status of the work. Based on the video, the authors of Tom's Hardware estimate and calculate the following: "The racks appear to be arranged in an array of 16 per row, with about four non-GPU racks dividing the rows. Each computer rack holds eight servers. The 20-second clip shows between 16 and 20 rows of server racks. Roughly speaking, 2,000 GPU servers can be seen, which is less than three percent of the estimated total number." They alone cause quite a lot of noise.

And the servers consume a lot of energy in cooled operation, which is one of the most overlooked costs of AI. Once the first stage of Cortex has been completed, the cluster will have a power requirement of 130 megawatts. This is enough to meet the power requirements of a small town for two to three hours. When Cortex has all 70,000 AI servers in operation, it is assumed that the energy requirement will increase to 500 megawatts.

Video of the inside of Cortex today, the giant new AI training supercluster being built at Tesla HQ in Austin to solve real-world AI pic.twitter.com/DwJVUWUrb5— Elon Musk (@elonmusk) August 26, 2024

It's hard to stand on one leg ...

After all, Tesla uses Supermicro's liquid cooling technology for Cortex. The company claims that direct liquid cooling can reduce electricity costs for the cooling infrastructure by up to 89 percent compared to air cooling. The CEO of Supermicro, Charles Liang, made a somewhat irritating comparison in July 2024. He said that 20 billion trees could be saved if liquid cooling were to become established in large data centers. It can be assumed that he is referring to all giant data centers.

Speaking of other AI training clusters: Cortex as part of the Tesla Gigafactory supercomputer cluster is, as mentioned, not the only supercomputer being worked on in Musk's ventures. The x.AI supercomputer is a little better known - and a little bigger. 100,000 H100 GPUs from Nvidia are to train the GrokAI for X premium users in the x.AI supercomputer. The x.AI AI training cluster is also to be expanded by 300,000 B200 GPUs in the coming year (2025). (sb)

Subscribe to the newsletter now

Don't Miss out on Our Best Content

By clicking on „Subscribe to Newsletter“ I agree to the processing and use of my data according to the consent form (please expand for details) and accept the Terms of Use. For more information, please see our Privacy Policy. The consent declaration relates, among other things, to the sending of editorial newsletters by email and to data matching for marketing purposes with selected advertising partners (e.g., LinkedIn, Google, Meta)

Unfold for details of your consent