Up to 256 exaFLOPS across 2048 nodes WSE-3: AI giant chip with 4 trillion transistors

From Michael Eckstein | Translated by AI 4 min Reading Time

Modern 5-nm manufacturing makes it possible: On the new giant chip WSE-3 by Cerebras, 4 trillion transistors combine to form 900,000 AI compute cores and 44 GB of memory. This plate-sized processor achieves a computing power of 125 petaflops – sufficient for training AI models with up to 24 trillion parameters.

The new WSE-3 chip significantly surpasses its predecessor in computing power for training generative AI models.(Image: Cerebras)
The new WSE-3 chip significantly surpasses its predecessor in computing power for training generative AI models.
(Image: Cerebras)

Already in 2021, Cerebras set a world record for monolithically integrated chips with its predecessor, the WSE-2 (Wafer Scale Engine) chip: The giant processor combines 2.6 trillion transistors into 850,000 compute cores and accesses 40 GB of onboard memory. The new WSE-3 raises the bar even further: On approximately the same area, now 4 trillion transistors are packed, forming an additional 50,000 AI-optimized cores and 4 GB of additional onboard memory.

This is possible because Cerebras has now moved to manufacturing its pizza-sized processor using 5-nm process technology at TSMC, instead of the 7-nm technology. This also has a positive effect on energy efficiency: With the same power consumption – around 20 kW – WSE-3 delivers, according to Cerebras, double the performance of the previous record holder WSE-2, namely 125 petaflops. This makes it clear: WSE-3 is not a candidate for home gaming PCs. Instead, the processor is used in the specially developed supercomputer Cerebras CS-3, where, according to Cerebras, it trains "the industry's largest AI models" with up to 24 trillion parameters.

Cluster of up to 2048 CS-3 systems

Each CS-3 is capable of addressing up to 1.2 petabytes of external storage. The system is designed to train generative AI models that are 10 times larger than GPT-4 and Gemini. According to Cerebras, 24 trillion parameter models can be stored in a single logical memory space without the need for partitioning or refactoring, "which drastically simplifies the training workflow and accelerates developer productivity." Training a model with a trillion parameters on the CS-3 is thus as simple as training a model with a billion parameters on GPUs. And there's more: Up to 2048 CS-3 systems can be interconnected into a cluster.

According to the manufacturer, the CS-3 is designed for both enterprise and hyperscale requirements. With four compact system configurations, 70B models can be fine-tuned in a single day, while at full scale with 2048 systems, Llama 70B can be trained from scratch in just one day – an unprecedented performance for generative AI.

Native support for PyTorch 2.0

The latest Cerebras software framework offers native support for PyTorch 2.0 and the latest AI models and techniques such as multimodal models, vision transformers, mixture of experts, and diffusion. Cerebras remains the only platform that offers native hardware acceleration for dynamic and unstructured sparsity, thereby accelerating training by up to 8 times. Sparsity refers to the process of reducing the number of parameters in neural networks without compromising accuracy.

"When we embarked on this journey eight years ago, everyone said that wafer-scale processors were a pipe dream. We couldn't be prouder to introduce the third generation of our groundbreaking AI chip," says Andrew Feldman, CEO and co-founder of Cerebras. WSE-3 is the fastest AI chip in the world, designed specifically for the latest AI tasks.

Standard implementation of GPT-3 with just 565 lines of code

According to Feldman, CS-3 delivers more computing power in less space and with less energy than any other system. While the power consumption of GPUs doubles from one generation to the next, CS-3 doubles performance while staying within the same power range. According to Cerebras, the CS-3 is very user-friendly, requiring 97 percent less code than GPUs for large language models (LLMs) and being able to train models from 1B to 24T parameters in pure data-parallel mode. Therefore, a standard implementation of a model the size of GPT-3 on Cerebras requires only 565 lines of code – "an industry record."

According to its own statements, Cerebras already has a considerable backlog of orders for CS-3 from companies, government agencies, and international clouds. "As a long-term partner of Cerebras, we are very interested in seeing what is possible with the further development of wafer-scale technology," says Rick Stevens, Associate Laboratory Director for Computing, Environment and Life Sciences of the Argonne National Laboratory. The boldness of Cerebras continues to pave the way for the future of AI.

New AI supercomputer under construction

"As part of our multi-year collaboration with Cerebras to develop AI models that improve patient outcomes and diagnoses, we are excited about the advancements in technological capabilities," says Dr. Matthew Callstrom, M.D., medical director for strategy and chair of radiology at the Mayo Clinic.

The CS-3 will also play a significant role in the collaboration between Cerebras and G42. This partnership has already delivered 8 exaFLOPs of AI supercomputer performance through Condor Galaxy 1 (CG-1) and Condor Galaxy 2 (CG-2). Both CG-1 and CG-2, which are deployed in California, rank among the largest AI supercomputers in the world.

Subscribe to the newsletter now

Don't Miss out on Our Best Content

By clicking on „Subscribe to Newsletter“ I agree to the processing and use of my data according to the consent form (please expand for details) and accept the Terms of Use. For more information, please see our Privacy Policy. The consent declaration relates, among other things, to the sending of editorial newsletters by email and to data matching for marketing purposes with selected advertising partners (e.g., LinkedIn, Google, Meta)

Unfold for details of your consent

Super AI computer Condor Galaxy 3 with 64 CS-3 under construction

Condor Galaxy 3 is currently under construction. The supercomputer will be equipped with 64 CS-3 systems, delivering 8 exaFLOPs of AI computing power. Condor Galaxy 3 is the third installation in the Condor Galaxy network. Condor Galaxy has trained some of the leading open-source models in the industry, including Jais-30B, Med42, Crystal-Coder-7B, and BTLM-3B-8K.

"Our strategic partnership with Cerebras has significantly advanced innovation at G42 and will help accelerate the AI revolution on a global scale," is convinced Kiril Evtimov, Group CTO of G42. (me)