Memory Architectures for AI Accelerators SanDisk Patent Stacks Processor Directly on NAND Tile

From Sebastian Gerstl 2 min Reading Time

Related Vendor

To address the long-term issue of storage shortages in data centers and HPC systems, SanDisk is exploring new approaches beyond High-Bandwidth Memory (HB) or High-Bandwidth Flash (HBF): In a patent, the hard drive specialist describes a 3D stack consisting of compute dies and NAND CBA tiles. HBM remains on the interposer but is used primarily for latency-critical tasks.

SanDisk goes beyond HBM: A U.S. patent reveals the direct connection of a processor to a NAND stack array, with HBM stacks on a shared interposer.(Image: U.S. Patent and Trademark Office / SanDisk)
SanDisk goes beyond HBM: A U.S. patent reveals the direct connection of a processor to a NAND stack array, with HBM stacks on a shared interposer.
(Image: U.S. Patent and Trademark Office / SanDisk)

In addition to High Bandwidth Flash (HBF), SanDisk appears to be working on another approach to address memory bottlenecks in AI and HPC systems. A U.S. patent numbered US 12,430,274 B2 describes an architecture in which a multicore processor is directly connected to a non-volatile NAND memory tile.

According to the description, the compute die can be implemented as a GPU or an AI accelerator, for example. Beneath it lies a CBA tile—that is, a CMOS-bonded-to-array structure—which combines a large NAND flash array with a CMOS logic layer. The entire assembly is placed on an interposer.

HBM does not disappear in this concept. The HBM stacks will continue to be arranged on the same interposer around the combined compute and NAND stack. However, their role is shifting: they will primarily serve for immediate, very fast memory accesses, while the NAND layer is intended to handle larger data volumes and read- or write-intensive tasks.

New Division of Labor in the Storage Hierarchy Model

This approach addresses the known limitations of today’s memory hierarchies. HBM offers high bandwidth and close proximity to the compute core, but it is expensive, in short supply, and limited in capacity per stack. The texts cited here mention typical HBM stacks ranging from 32 to 64 GB. For large models and datasets, this capacity can become a limiting factor.

HBF is intended to partially close this gap. SanDisk is also relying on vertical stacking, but uses NAND flash instead of DRAM. Multiple layers of NAND are connected via TSVs to form a single memory stack. According to the available information, HBF aims for capacities of up to 4 TByte per stack and is expected to achieve significantly higher capacity than HBM at comparable costs.

The drawback remains NAND’s fundamental position in the memory hierarchy. NAND is cheaper and has higher density, but it is slower than DRAM and, in traditional systems, farther away from the compute die. The patent aims to reduce this distance through 3D integration. What is particularly interesting for developers is that not only the memory technology but also the packaging and data paths are being reimagined.

At this point, it is still a patent, not an announced product. Questions regarding power consumption, thermal coupling, yield, cost, and manufacturability of such a composite remain unanswered. Strategically, however, the design shows where memory architectures for AI accelerators might be headed: away from clearly separated memory classes and toward more tightly integrated compute-memory stacks with functionally tiered memory roles. (sg)

Subscribe to the newsletter now

Don't Miss out on Our Best Content

By clicking on „Subscribe to Newsletter“ I agree to the processing and use of my data according to the consent form (please expand for details) and accept the Terms of Use. For more information, please see our Privacy Policy. The consent declaration relates, among other things, to the sending of editorial newsletters by email and to data matching for marketing purposes with selected advertising partners (e.g., LinkedIn, Google, Meta)

Unfold for details of your consent