ASICs vs. GPUs: Is Nvidia's AI Dominance at Risk?

Computing power for AI ASICs vs. GPUs: Is Nvidia's AI Dominance at Risk?

2024-07-11 From Susanne Braun | Translated by AI 4 min Reading Time

When talking about large language models and their training clusters or about artificial intelligence in general, people often don't question whether GPUs are the right choice for the technology behind AI, or whether ASICs will outperform them in the long term. Etched wants to change that with a transformer ASIC.

Etched has introduced Sohu, the world's first transformer ASIC. The technology specialized for LLMs provides faster performance than the all-rounder GPUs widely used for AI.(Image: Etched) — Etched has introduced Sohu, the world's first transformer ASIC. The technology specialized for LLMs provides faster performance than the all-rounder GPUs widely used for AI.
(Image: Etched)

"If I were to work with ASICs specifically designed for AI tasks, I could potentially achieve faster performance and greater efficiency," GPT-4 suggests when asked whether it operates on GPUs and if it might be more effective for the AI model to run on ASICs.

When artificial intelligence is mentioned, the discussion often revolves around which Large Language Model (LLM) is best trained, the fastest, or which GPU hardware the model runs on. Does Nvidia make the best AI chips or AMD or perhaps Intel?

Gallery

Etched has introduced the world's first Transformer ASIC Sohu. The technology specialized for LLMs offers faster performance than the all-rounder GPUs widely used for AI.(Image: Etched)

Through specialization, Sohu achieves unprecedented performance. An 8xSohu server can process over 500,000 Llama-70B tokens per second. Benchmarks are for Llama-3 70B in FP8 precision: without sparsity, 8x model parallel, 2048 input/128 output lengths. 8xH100s computed with TensorRT-LLM 0.10.08 (latest version), the 8xGB200 figures are estimated, Etched admits.(Image: Etched)

Under the hood, all transformers are the same, say the folks at Etched.(Image: Etched)

Between 2022 and 2025, AI chips haven't really gotten better, they've just gotten bigger. NVIDIA's B200, AMD's MI300, Intel's Gaudi 3, and Amazon's Trainium2 count two chips as one card to "double" the output. All GPU performance improvements from 2022 to 2025 use this trick.(Image: Etched)

Gallery with 7 images

Less often is it questioned whether the combination of GPU and CPU used for many AI chips is actually the most sensible hardware architecture for artificial intelligence—or whether there might be a better technology. As you already learned from the answer to the initial question, there is hardware that could be better suited for artificial intelligence: ASIC.

Application-Specific Integrated Circuits

A transformer ASIC is a specialized chip that directly integrates the transformer architecture into the hardware. A transformer is a special architectural style for neural networks, which is particularly used in natural language processing (NLP). Compared to general processors like GPUs, a transformer ASIC allows for more efficient and faster execution of models as it is specifically optimized for the requirements and calculations. Thus, these chips can efficiently support large models with billions of parameters.

It is therefore not surprising that the relatively young ASIC manufacturer Etched is proud of its in-house product Sohu, the world's first transformer ASIC, which is said to be faster than the current Nvidia Blackwell generation—which is no wonder with a specialized chip, as we will explain below.

The technology from Etched integrates the transformer architecture directly into the chip, making AI models run much faster and cheaper than with GPUs. "We have spent the last two years developing Sohu, the world's first specialized chip (ASIC) for transformers (the 'T' in ChatGPT)," say the representatives from Etched.

"With a throughput of over 500,000 tokens per second in Llama 70B, Sohu enables the development of products that are impossible on GPUs. Sohu is faster and cheaper than even Nvidia's next-generation Blackwell GPUs. Today, every modern LLM is a transformer: ChatGPT, Sora, Gemini, Stable Diffusion 3 and others. If transformers are replaced by SSMs (Sparse Supervised Learning Models), RWKV (Receptive Field Weighted Kernel Virtualization) or some other new architecture, our chips are useless," explain the people behind Sohu further. The development of Sohu is a gamble into the uncertain future of whether AI will continue to be based on the transformer architecture.

Winning the race for the greatest computing capacity intelligently

If the developers and decision makers at Etched are correct in their assumption that AI will continue to be based on the architecture, then according to them, Sohu could change the world.

Within five years, AI models have become so smart that they can reliably perform standardized tests better than humans, which in turn has to do with the fact that more and more computing power could be provided for the AI models. However, scaling of data centers is not infinitely possible, as we have already hinted at in this post.

Similarly, this is viewed at Etched: "Scaling the next 1,000-fold will be costly. The next generation of data centers will cost more than the GDP of a small nation. At the current pace, our hardware, our power grids, and our wallets can't keep up."

It took two and a half years from the provision of Nvidia's H100 to the first delivery of B200, and the performance gain is only 15 percent. Therefore, Etched believes that specializing in Transformer chips was the right choice, as all major AI models like OpenAI's GPT family, Google's PaLM, Meta's LLaMa, and Tesla's FSD are Transformers.

"If models cost more than 1 billion USD for training and over 10 billion USD for inference, specialized chips are inevitable. At this scale, a 1 percent improvement would justify a project for a special chip worth 50 to 100 million USD," argues Etched.

An 8xSohu server replaces 160 H100 GPUs

In their Sohu announcement, the inventors of the chip aren't stingy, they're extravagant. "Since Sohu can only run one algorithm, most of the control flow logic can be removed, leaving many more computation blocks available. As a result, Sohu achieves a FLOPS utilization of over 90% (compared to ~30% for a GPU with TRT-LLM)," they explain. "The Nvidia H200 has 989 TFLOPS of FP16/BF16 compute power without Sparsity. This is state of the art (even more than Google's new Trillium chip), and the GB200, coming to market in 2025, has only 25% more compute power (1,250 TFLOPS per die)."

Subscribe to the newsletter now

Don't Miss out on Our Best Content

Business E-mail

Please enter a valid mailadress.

By clicking on „Subscribe to Newsletter“ I agree to the processing and use of my data according to the consent form (please expand for details) and accept the Terms of Use. For more information, please see our Privacy Policy. The consent declaration relates, among other things, to the sending of editorial newsletters by email and to data matching for marketing purposes with selected advertising partners (e.g., LinkedIn, Google, Meta)

Date: 08.12.2025

Naturally, we always handle your personal data responsibly. Any personal data we receive from you is processed in accordance with applicable data protection legislation. For detailed information please see our privacy policy.

Consent to the use of data for promotional purposes

I hereby consent to Vogel Communications Group GmbH & Co. KG, Max-Planck-Str. 7-9, 97082 Würzburg including any affiliated companies according to §§ 15 et seq. AktG (hereafter: Vogel Communications Group) using my e-mail address to send editorial newsletters. A list of all affiliated companies can be found here

Newsletter content may include all products and services of any companies mentioned above, including for example specialist journals and books, events and fairs as well as event-related products and services, print and digital media offers and services such as additional (editorial) newsletters, raffles, lead campaigns, market research both online and offline, specialist webportals and e-learning offers. In case my personal telephone number has also been collected, it may be used for offers of aforementioned products, for services of the companies mentioned above, and market research purposes.

Additionally, my consent also includes the processing of my email address and telephone number for data matching for marketing purposes with select advertising partners such as LinkedIn, Google, and Meta. For this, Vogel Communications Group may transmit said data in hashed form to the advertising partners who then use said data to determine whether I am also a member of the mentioned advertising partner portals. Vogel Communications Group uses this feature for the purposes of re-targeting (up-selling, cross-selling, and customer loyalty), generating so-called look-alike audiences for acquisition of new customers, and as basis for exclusion for on-going advertising campaigns. Further information can be found in section “data matching for marketing purposes”.

In case I access protected data on Internet portals of Vogel Communications Group including any affiliated companies according to §§ 15 et seq. AktG, I need to provide further data in order to register for the access to such content. In return for this free access to editorial content, my data may be used in accordance with this consent for the purposes stated here. This does not apply to data matching for marketing purposes.

Right of revocation

I understand that I can revoke my consent at will. My revocation does not change the lawfulness of data processing that was conducted based on my consent leading up to my revocation. One option to declare my revocation is to use the contact form found at https://contact.vogel.de. In case I no longer wish to receive certain newsletters, I have subscribed to, I can also click on the unsubscribe link included at the end of a newsletter. Further information regarding my right of revocation and the implementation of it as well as the consequences of my revocation can be found in the data protection declaration, section editorial newsletter.

GPUs were designed with flexibility in mind. GPUs are programmable and can therefore be used for a variety of tasks, while ASICs are specifically developed for a particular application and are less flexible - in the case of the Sohu ASIC purely for Transformer AI models. Accordingly, the hardware supports today's models from OpenAI, Google, Meta, Microsoft and more, and customized versions of them that will come in the future.

Yet the development of a less flexible ASIC is expensive and time-consuming. GPUs are readily available and offer quick implementation, have a mature software and hardware infrastructure, including widely used programming languages and libraries. Even in terms of scalability, GPUs have so far scored points. So, it's no surprise that Nvidia is currently one of the most sought-after manufacturers of AI chips. The question is: for how much longer?

Indeed, the inventors of Sohu are not wrong when they question the effectiveness and especially the scalability for future, even more powerful models—more information on this can be found in the report from Etched. It's not without reason that Bitcoin miners are now turning to ASICs or FPGAs. (sb)

Nvidia's Blackwell is the new generation of AI chips from the AI class leader. Advanced packaging is needed to enhance the performance, efficiency and functionality of modern semiconductor chips by closely and precisely bringing together and electrically connecting different chip components. (Image:Nvidia)

Link: Sohu by Etched