Skip to main content

Memcon 2024: HBM is Poppin‘

·1098 words·6 mins
Rick Vasquez
Memcon 2024 - This article is part of a series.
Part 2: This Article

When 2 of the 3 companies on the planet that make a product say publicly that it is sold out through 2024 and one says it’s sold out through 2025, you may start to think, “hmmm should they have made more of that?

What’s Driving HBM Bit Growth?

What’s interesting about High Bandwidth Memory (HBM)1, beyond it’s incredible outpacing of the raw bit growth in the memory market, is that it seemingly came out of nowhere. In their keynote, Samsung effectively put that notion to bed. They were clear that there is a 3 year product planning cycle, and that they committed a significant amount of manufacturing capability to HBM starting 3 years ago and further, have continued to repurpose capacity that was used for other memory technologies (looking at you NAND) to generating more HBM. What’s interesting about HBM is it’s rapid progression from a “highly specialized seemingly gap filling memory” to a “key enabler” in the Generative AI/huge neural network world where companies are strapping 10s of thousands of GPU’s together.

HBM has made apparent, through a slew of usecase, that it’s presence makes a significant impact on system architectures and memory tiering discussions. There was once a time when, from the CPU, you had an L1 and maybe an L2 cache, now we have L1 -> L2 -> L3 -> Shared L4 (mostly HBM) -> DRAM -> CXL DRAM Expansion -> PMEM -> CXL Far Memory Pooling and we aren’t even out of the DRAM space yet. When you layer in the need for each accelerator or processor to have their own massive memory pools, you enter a world where CPU’s are shipping with GIGABYTES of cache on them and where GPU’s are coming with 3 digit GB capacities (~100+GB). This all sounds great and now that we have seen the products that are released we can easily tell why HBM is effectively sold out in the industry, but we’ve also effectively run out of actual physical space (silicon “real estate”) to shove in any more memory, regardless of the price. It’s just not feasible to put any more on these packages, and so there is a demand for higher density, more stacking and even more throughput/bandwidth. Sounds a lot like the journey NAND was on in it’s early days, but unlike NAND (which did reduce the footprint of storage devices), HBM must be tightly coupled and baked directly into product designs, limiting the amount of space it can occupy before impacting GPU compute real estate.

It’s also no secret that NVIDIA2 also effectively sold out of it’s products upon announcement. There is a venn diagram of HBM consumption and product sold that pretty closely aligns to only 3-5 end customers, with the rest going into embedded and specialized use cases. When you put all of this together it becomes clear that the conversations the memory vendors are having with they key customers are helping them shape the output for their HBM in their precious manufacturing facilities to optimize the balance of memory production. An unexpected twist in this journey was how quickly and how big the orders would be when GenAI started to push the race to acquire your H100/H200/GH200 etc. Now that product was on order and not yet produced it’s easy to sell product that hasn’t even been fabbed yet, leading to a huge pull in for the revenue and long term capacity contracts. This is great for industry stability, but the memory vendors are being transparent and so are the analysts. This spike and “come up” for HBM is exactly that, it’s joining the party as a first class citizen in the new memory heirarchy, but it isn’t going to gobble up SRAM or DRAM use cases, it’s simply new capability that will now grow at a healthy rate alongside the other tiers of memory in a proportional way.

What Is Keeping HBM From Overtaking DRAM?

There is near concensus among the vendors and analysts alike that HBM has already and will continue to level out, mainly because of constraints on being able to build more GPUs and accelerators, if there was infinite capacity to build CPUs, GPUs and other advanced processing silicon there would no doubt be a need to satiate the demand that those would put on the memory vendors to supply the maxed out amount of HBM to those. What we are seeing is the need for advances in semiconductor interconnects and manufacturing to be able to attach more HBM to these systems. In the final post of this series we dive a bit deeper into this topic, but at a high level the work that companies like Eliyan who just closed a massive 60M round with Samsung and Tiger Global Management are doing is key to being able to attach even more HBM modules to compute and will surely drive another bump in HBM demand. That will be the time to watch either new fabs to come online or for a re-mixing of current output away from some types of memory into HBM (NAND, looking at you again).

The HBM Roadmap

There is demand for 3 key features of HBM that are focus areas for all the manufacturers and opportunities for them to differentiate from each other with their unique intellectual property and manufacturing capability.

  1. Bandwidth - There is a need to transfer more data faster as these models get larger, but also as inference demand rises from actual end users leveraging the pre-trained models in production. HBM plays a key role in the low latency responses that users want when they are interacting with their favorite models.
  2. Capacity - If CPU/GPU’s could put more HBM on they would, but they can’t, this leaves an obvious desire for more but there is a sane limit that we haven’t quite yet reached with the sizes of these models.
  3. Density - This is separate from Capacity because it’s more of a packaging exercise, how can you stack or effectively put more HBM in the same physical space it was occupying in previous generations. They go hand in hand, but are distinct from one another.

All 3 of these features are critical for the continued bit shipment growth and market share expansion of HBM, which will ultimately require a signifcant amount of capital investment and take a relatively long time (normal 2-3 year cycle) before we see the effects of decisions being made today.

  1. ↩︎

  2. the 95% market share having Data Center GPU Vendor, for those who may not be paying attention or living under a rock ↩︎

Memcon 2024 - This article is part of a series.
Part 2: This Article