SK Hynix Introduces Hybrid Memory Architecture for Improved AI Inference

High Bandwith Flash (HBF) SK Hynix Introduces Hybrid Memory Architecture for Improved AI Inference

2026-02-24 From Sebastian Gerstl | Translated by AI 3 min Reading Time

Related Vendors

FAULHABER Drive Systems

tmts2026-ticec-banner-en-2000px (https://www.tmts.tw/en)

Taiwan Machine Tool & Accessory Builders' Association (TMBA)

Wuxi InfiMotion Technology Co., Ltd.

SK Hynix wants to address the increased demands of memory-intensive AI applications with a new approach to memory architectures. The concept, called H³, combines High Bandwidth Memory (HBM) and High Bandwidth Flash (HBF) on an interposer. According to a study, simulations show significant efficiency gains.

Left: High Bandwidth Flash (HBF) stacks multiple layers of NAND chips to significantly increase storage capacity; Right: Concept of the "hybrid" H³ architecture presented in the IEEE study.(Image: Sandisk (left) / Sk hynix (right)) — Left: High Bandwidth Flash (HBF) stacks multiple layers of NAND chips to significantly increase storage capacity; Right: Concept of the "hybrid" H³ architecture presented in the IEEE study.
(Image: Sandisk (left) / Sk hynix (right))

With the H³ architecture, SK Hynix describes a hybrid memory concept for AI accelerators in an IEEE paper. The aim is to adapt bandwidth and capacity more closely to the requirements of large language models in the inference phase.

The core of the idea is the combination of High Bandwidth Memory (HBM) and High Bandwidth Flash (HBF) on a common interposer next to the GPU. While current designs - such as those based on Blackwell B200 - only connect HBM directly, H³ supplements the DRAM stack with stacked NAND flash with high parallelism.

HBF as Capacity Level Next to HBM

HBF stacks several 3D NAND dies in an HBM-like package structure. Unlike classic SSD architectures, the concept relies on a highly parallelized sub-array structure with independent read and write channels. This shortens internal data paths and increases the effective I/O parallelism.

Compared to HBM, HBF offers a significantly higher capacity, but with higher access latency and limited write endurance of typically around 100,000 cycles. The bandwidth is significantly higher than that of NVMe SSDs, but remains below the DRAM latency characteristics.

In H³, HBM and HBF stacks are connected in cascade. Access is via shared addressing; the GPU can use both memory areas as main memory. A prefetch or latency-hiding buffer integrated in the HBM base die is intended to cushion the higher NAND latencies.

Focus on KV Cache in Inference

The concept is driven by the growing memory requirements of large language models in inference. In particular, the key-value cache (KV cache), which caches context information, scales strongly with sequence lengths and batch sizes.

Sequences in the million-token range can require cache sizes in the terabyte range. In today's systems, the limited HBM capacity means that data has to be outsourced to local SSDs or GPUs have to be scaled. Both increase latency and energy requirements.

H³ provides for read-only data such as model weights or pre-calculated, shared KV caches to be stored in the HBF, while dynamic data remains in the HBM. This decouples HBM from capacity load and focuses more strongly on bandwidth-critical operations.

According to SK Hynix, simulations with eight HBM3E stacks and eight HBF stacks in combination with a Blackwell B200 GPU show an up to 2.69-fold increase in performance per watt compared to HBM-only configurations. With a KV cache of 10 million tokens, the possible batch size increased by a factor of 18.8.

Technical Hurdles and Standardization

The integration of NAND into HBM-related packaging poses considerable challenges. In addition to latency, controller design, wear leveling and the management of block-based addressing are particularly critical. Write performance is also becoming increasingly important for KV cache applications.

The energy requirement per access is also higher than that of HBM. The architecture therefore assumes that workloads are clearly read-intensive or are optimized accordingly by software. Cache-augmented generation is a possible application scenario here.

At the same time, several providers are driving standardization forward. Samsung Electronics and SK Hynix are working together with SanDisk in a consortium on specifications for HBF. The aim is commercialization from 2027.

In the competition for memory-centric inference architectures, H³ is thus positioning itself as a supplement to HBM, not a replacement for it. Whether the concept prevails will largely depend on packaging complexity, cost structure and the software ecosystem.(sg)

Consent to the use of data for promotional purposes

I hereby consent to Vogel Communications Group GmbH & Co. KG, Max-Planck-Str. 7-9, 97082 Würzburg including any affiliated companies according to §§ 15 et seq. AktG (hereafter: Vogel Communications Group) using my e-mail address to send editorial newsletters. A list of all affiliated companies can be found here

Newsletter content may include all products and services of any companies mentioned above, including for example specialist journals and books, events and fairs as well as event-related products and services, print and digital media offers and services such as additional (editorial) newsletters, raffles, lead campaigns, market research both online and offline, specialist webportals and e-learning offers. In case my personal telephone number has also been collected, it may be used for offers of aforementioned products, for services of the companies mentioned above, and market research purposes.

Additionally, my consent also includes the processing of my email address and telephone number for data matching for marketing purposes with select advertising partners such as LinkedIn, Google, and Meta. For this, Vogel Communications Group may transmit said data in hashed form to the advertising partners who then use said data to determine whether I am also a member of the mentioned advertising partner portals. Vogel Communications Group uses this feature for the purposes of re-targeting (up-selling, cross-selling, and customer loyalty), generating so-called look-alike audiences for acquisition of new customers, and as basis for exclusion for on-going advertising campaigns. Further information can be found in section “data matching for marketing purposes”.

In case I access protected data on Internet portals of Vogel Communications Group including any affiliated companies according to §§ 15 et seq. AktG, I need to provide further data in order to register for the access to such content. In return for this free access to editorial content, my data may be used in accordance with this consent for the purposes stated here. This does not apply to data matching for marketing purposes.

Right of revocation

I understand that I can revoke my consent at will. My revocation does not change the lawfulness of data processing that was conducted based on my consent leading up to my revocation. One option to declare my revocation is to use the contact form found at https://contact.vogel.de. In case I no longer wish to receive certain newsletters, I have subscribed to, I can also click on the unsubscribe link included at the end of a newsletter. Further information regarding my right of revocation and the implementation of it as well as the consequences of my revocation can be found in the data protection declaration, section editorial newsletter.