Flash Memory How the AI Boom is Changing the Hardware Design of SSDs

From Manuel Christa Manuel Christa | Translated by AI 5 min Reading Time

Related Vendors

The immense computational demand of modern AI applications is not only advancing the development of GPUs but also forcing the storage industry to rethink its architecture. New form factors, extreme packing densities, and unconventional cooling concepts are becoming the standard.

Cold-plate cooling in detail: For completely fanless server environments, Solidigm has developed a liquid-cooled SSD design that dissipates the extreme heat of densely packed AI storage systems via direct contact (direct-to-chip).(Image: Solidigm)
Cold-plate cooling in detail: For completely fanless server environments, Solidigm has developed a liquid-cooled SSD design that dissipates the extreme heat of densely packed AI storage systems via direct contact (direct-to-chip).
(Image: Solidigm)

Anyone planning the construction of AI data centers or high-performance edge infrastructures today quickly encounters hard physical limits. Space requirements, thermal load, and, above all, the available electrical power massively limit expansion. In this context, the hardware design of storage media plays an unexpectedly central role, as Scott Shadley, Director of Leadership Narrative & Evangelist at storage manufacturer Solidigm, emphasized in mid-June at the "Technology Live" storage event by A3 Communication in Munich. Solidigm is the spin-off of Intel's storage division, acquired by SK Hynix in 2021 to form its own U.S. subsidiary.

New Key Currencies: Energy Efficiency and Quality of Service

Historically, the price per storage capacity was the most important metric in procurement. Those days are over in high-performance computing. "It used to be dollars per gigabyte. Nobody really cares about that anymore," explains Shadley. "Today, everything is primarily dictated by a power consumption figure."

Gallery
Gallery with 10 images

In automation and AI infrastructures, the available power at the site limits the number of usable compute accelerators (GPUs). Every watt saved on storage can potentially be allocated to additional computing power. The solution lies in high-density solid-state drives (SSDs) with capacities of up to 122 terabytes. Solidigm calculates that a conventional storage cluster consisting of nine racks with a power consumption of 54 kW can be compressed into a single rack requiring only 1.7 kW by using high-density SSDs.

Dense SSDs save space and power: A single rack with 122 TB SSDs replaces nine conventional HDD racks at a fraction of the energy consumption.(Image: Solidigm)
Dense SSDs save space and power: A single rack with 122 TB SSDs replaces nine conventional HDD racks at a fraction of the energy consumption.
(Image: Solidigm)

In addition to power consumption, performance evaluation is also undergoing a fundamental shift. Previously, manufacturers boasted about maximum IOPS (Input/Output Operations Per Second) values. However, this is no longer the key metric for AI architectures. Today, predictability, known as Quality of Service (QoS), is crucial. SSDs must guarantee that data is delivered with extremely high reliability (99.999 percent) within an exact, minimal time frame. If data arrives in irregular waves, the GPU waits for the dataset. This immensely expensive "idle time" of the processors must be avoided at all costs through a smooth, continuous data stream.

"Disposable Data" and the End of Wear Concerns

This necessity for a fast, continuous data stream disrupts the traditional storage pyramid. The industry is currently defining an entirely new intermediate layer in the server rack—often referred to as the "G3.5 layer." This is not about traditional, persistent long-term storage. This SSD layer is solely designed to rapidly read and write massive data blocks for AI inference (such as the KV cache of large language models). Since persistence is secondary here, this is referred to as ephemeral storage or "throwaway data." If something is lost, the system simply rebuilds the cache.

The new storage hierarchy: Complex AI workloads are breaking traditional architectural concepts and forcing the introduction of new intermediary layers (like the G3.5 level) for fast, volatile storage.(Image: Solidigm)
The new storage hierarchy: Complex AI workloads are breaking traditional architectural concepts and forcing the introduction of new intermediary layers (like the G3.5 level) for fast, volatile storage.
(Image: Solidigm)

When looking at such continuous write workloads on flash storage, hardware developers immediately think of lifespan due to limited write cycles. However, the industry provides a surprising insight: even though cost-effective QLC NAND (Quad-Level Cell) is used for these hyper-density storage solutions, flash wear has become mathematically irrelevant. Shadley calculates that drives with 122 terabytes cannot simply be "written to death" within a typical server lifecycle of five years. Today's interfaces cannot deliver data fast enough to overwrite all cells to their limit within this timeframe.

The End of Old Form Factors and the Challenge of Signal Integrity

To meet these enormous packing densities and the associated thermal requirements at the circuit board level, the mechanics of SSDs must change radically. Familiar standards are obsolete. "The U.2 form factor was derived from a 2.5-inch hard drive enclosure," Shadley recalls. "This is the last hard drive-based SSD product that still exists today."

Storage architecture for AI: A typical division into extremely fast, liquid-cooled storage layers (hot/DAS) at the compute node and network-connected, high-density storage racks (warm/NAS).(Image: Solidigm)
Storage architecture for AI: A typical division into extremely fast, liquid-cooled storage layers (hot/DAS) at the compute node and network-connected, high-density storage racks (warm/NAS).
(Image: Solidigm)

Future designs are exclusively oriented towards the requirements of flash technology. The industry is therefore shifting to dedicated form factors like E1.S and E3.S. These provide developers with more space on the board for components and are flatter, enabling more efficient cooling and higher packing density in servers.

Another technical driver for this shift is signal integrity at increasing data rates. The upcoming generations of the PCI-Express standard allow no compromises with physical connectors. "When we move specifically to PCIe Gen 6 and Gen 7, we will need an edge connector due to signal integrity, not a physical standard connector. That's why U.2 will gradually disappear with PCIe Gen 6," says Shadley.

Subscribe to the newsletter now

Don't Miss out on Our Best Content

By clicking on „Subscribe to Newsletter“ I agree to the processing and use of my data according to the consent form (please expand for details) and accept the Terms of Use. For more information, please see our Privacy Policy. The consent declaration relates, among other things, to the sending of editorial newsletters by email and to data matching for marketing purposes with selected advertising partners (e.g., LinkedIn, Google, Meta)

Unfold for details of your consent

Cooling Concepts at the Limit: Immersion Cooling in Practice

One of the greatest challenges remains thermal management. Traditional air cooling in server rooms results in deafening noise from massively high-speed fans and reaches its physical limits. Additionally, traditional air cooling with chiller systems wastes enormous amounts of water compared to liquid cooling.

Cooling concepts at the limit: Beyond traditional air cooling, SSDs are increasingly being designed for cold-plate systems and complete immersion cooling in non-conductive liquids.(Image: Solidigm)
Cooling concepts at the limit: Beyond traditional air cooling, SSDs are increasingly being designed for cold-plate systems and complete immersion cooling in non-conductive liquids.
(Image: Solidigm)

The industry is therefore moving towards cold-plate and liquid cooling. Since storage drives typically need to remain hot-pluggable, meaning replaceable during operation, this requires special designs. For fanless server environments, liquid-cooled SSD solutions have already been developed that can be directly connected to cooling circuits.

The next logical step is "immersion cooling," the complete submersion of electronics in non-conductive liquids (e.g., special oils). While semiconductor components can easily withstand this immersion today, the challenges for engineers often lie in unexpected details.

Shadley shared a piece of wisdom from everyday lab work: "One of the most interesting things we encountered in the immersion cooling environment had nothing to do with the components or the functional aspects of the drive. It was actually about keeping the label on the drive. Apparently, this liquid dissolves the adhesive very effectively." The unconventional but pragmatic solution from the hardware developers: clear nail polish was used to secure the labeling, preventing the labels from peeling off in the cooling bath.

Soon Water-Cooled for Edge IT?

These profound architectural changes are by no means limited to the massive data centers of hyperscalers. Compact, flash-optimized form factors and fanless or liquid-based cooling concepts are precisely the building blocks needed for rugged edge-inferencing environments. When AI models are calculated directly at telecommunications towers, in industrial manufacturing, or in connected automation systems in the future, compact design, maximum energy efficiency, and reliability under challenging climatic conditions will be crucial. The days of the hard drive enclosure are thus numbered even in industrial hardware.