Computing power for AI ASICs vs. GPUs: Is Nvidia's AI Dominance at Risk?

From Susanne Braun | Translated by AI 4 min Reading Time

When talking about large language models and their training clusters or about artificial intelligence in general, people often don't question whether GPUs are the right choice for the technology behind AI, or whether ASICs will outperform them in the long term. Etched wants to change that with a transformer ASIC.

Etched has introduced Sohu, the world's first transformer ASIC. The technology specialized for LLMs provides faster performance than the all-rounder GPUs widely used for AI.(Image: Etched)
Etched has introduced Sohu, the world's first transformer ASIC. The technology specialized for LLMs provides faster performance than the all-rounder GPUs widely used for AI.
(Image: Etched)

"If I were to work with ASICs specifically designed for AI tasks, I could potentially achieve faster performance and greater efficiency," GPT-4 suggests when asked whether it operates on GPUs and if it might be more effective for the AI model to run on ASICs.

When artificial intelligence is mentioned, the discussion often revolves around which Large Language Model (LLM) is best trained, the fastest, or which GPU hardware the model runs on. Does Nvidia make the best AI chips or AMD or perhaps Intel?

Gallery
Gallery with 7 images

Less often is it questioned whether the combination of GPU and CPU used for many AI chips is actually the most sensible hardware architecture for artificial intelligence—or whether there might be a better technology. As you already learned from the answer to the initial question, there is hardware that could be better suited for artificial intelligence: ASIC.

Application-Specific Integrated Circuits

A transformer ASIC is a specialized chip that directly integrates the transformer architecture into the hardware. A transformer is a special architectural style for neural networks, which is particularly used in natural language processing (NLP). Compared to general processors like GPUs, a transformer ASIC allows for more efficient and faster execution of models as it is specifically optimized for the requirements and calculations. Thus, these chips can efficiently support large models with billions of parameters.

It is therefore not surprising that the relatively young ASIC manufacturer Etched is proud of its in-house product Sohu, the world's first transformer ASIC, which is said to be faster than the current Nvidia Blackwell generation—which is no wonder with a specialized chip, as we will explain below.

The technology from Etched integrates the transformer architecture directly into the chip, making AI models run much faster and cheaper than with GPUs. "We have spent the last two years developing Sohu, the world's first specialized chip (ASIC) for transformers (the 'T' in ChatGPT)," say the representatives from Etched.

"With a throughput of over 500,000 tokens per second in Llama 70B, Sohu enables the development of products that are impossible on GPUs. Sohu is faster and cheaper than even Nvidia's next-generation Blackwell GPUs. Today, every modern LLM is a transformer: ChatGPT, Sora, Gemini, Stable Diffusion 3 and others. If transformers are replaced by SSMs (Sparse Supervised Learning Models), RWKV (Receptive Field Weighted Kernel Virtualization) or some other new architecture, our chips are useless," explain the people behind Sohu further. The development of Sohu is a gamble into the uncertain future of whether AI will continue to be based on the transformer architecture.

Winning the race for the greatest computing capacity intelligently

If the developers and decision makers at Etched are correct in their assumption that AI will continue to be based on the architecture, then according to them, Sohu could change the world.

Within five years, AI models have become so smart that they can reliably perform standardized tests better than humans, which in turn has to do with the fact that more and more computing power could be provided for the AI models. However, scaling of data centers is not infinitely possible, as we have already hinted at in this post.

Similarly, this is viewed at Etched: "Scaling the next 1,000-fold will be costly. The next generation of data centers will cost more than the GDP of a small nation. At the current pace, our hardware, our power grids, and our wallets can't keep up."

It took two and a half years from the provision of Nvidia's H100 to the first delivery of B200, and the performance gain is only 15 percent. Therefore, Etched believes that specializing in Transformer chips was the right choice, as all major AI models like OpenAI's GPT family, Google's PaLM, Meta's LLaMa, and Tesla's FSD are Transformers.

"If models cost more than 1 billion USD for training and over 10 billion USD for inference, specialized chips are inevitable. At this scale, a 1 percent improvement would justify a project for a special chip worth 50 to 100 million USD," argues Etched.

An 8xSohu server replaces 160 H100 GPUs

In their Sohu announcement, the inventors of the chip aren't stingy, they're extravagant. "Since Sohu can only run one algorithm, most of the control flow logic can be removed, leaving many more computation blocks available. As a result, Sohu achieves a FLOPS utilization of over 90% (compared to ~30% for a GPU with TRT-LLM)," they explain. "The Nvidia H200 has 989 TFLOPS of FP16/BF16 compute power without Sparsity. This is state of the art (even more than Google's new Trillium chip), and the GB200, coming to market in 2025, has only 25% more compute power (1,250 TFLOPS per die)."

Subscribe to the newsletter now

Don't Miss out on Our Best Content

By clicking on „Subscribe to Newsletter“ I agree to the processing and use of my data according to the consent form (please expand for details) and accept the Terms of Use. For more information, please see our Privacy Policy. The consent declaration relates, among other things, to the sending of editorial newsletters by email and to data matching for marketing purposes with selected advertising partners (e.g., LinkedIn, Google, Meta)

Unfold for details of your consent

GPUs were designed with flexibility in mind. GPUs are programmable and can therefore be used for a variety of tasks, while ASICs are specifically developed for a particular application and are less flexible - in the case of the Sohu ASIC purely for Transformer AI models. Accordingly, the hardware supports today's models from OpenAI, Google, Meta, Microsoft and more, and customized versions of them that will come in the future.

Yet the development of a less flexible ASIC is expensive and time-consuming. GPUs are readily available and offer quick implementation, have a mature software and hardware infrastructure, including widely used programming languages and libraries. Even in terms of scalability, GPUs have so far scored points. So, it's no surprise that Nvidia is currently one of the most sought-after manufacturers of AI chips. The question is: for how much longer?

Indeed, the inventors of Sohu are not wrong when they question the effectiveness and especially the scalability for future, even more powerful models—more information on this can be found in the report from Etched. It's not without reason that Bitcoin miners are now turning to ASICs or FPGAs. (sb)

Link: Sohu by Etched