Understanding HBM design challenges

This entry was posted on Thursday, April 19th, 2018.

HBM2 @ 256GB/s

As Semiconductor Engineering’s Ann Steffora Mutschler observes, high-bandwidth memory (HBM) enables lower power consumption per I/O, as well as higher bandwidth memory access with a more condensed form factor. This is accomplished by stacking memory dies directly on top of each other – and sharing the same package as an SoC or GPU core using a silicon interposer. Each pin drives a much shorter trace and runs at a lower frequency – resulting in significantly lower switching power.

“High-bandwidth performance gains are achieved by a very wide I/O parallel interface,” Mutschler explains.

“HBM1 can deliver 128GB/s, while HBM2 offers 256GB/s maximum bandwidth. Memory capacity is easily scaled by adding more dies to the stack – or adding more stacks to the system-in-package.”

As Frank Ferro, a senior director of product management at Rambus notes, HBM takes existing DRAM with 2.5D technology and moves it closer to the processor using a fatter data pipe. This paradigm accelerates data throughput – effectively reducing the amount of power required to drive a signal and cutting RC delay.

“Originally, high-bandwidth memory was seen by the graphics companies as a clear step in the evolutionary direction,” Ferro tells Semiconductor Engineering. “But then the networking and data center community realized HBM could add a new tier of memory in their memory hierarchy for more bandwidth and all the things that are driving the datacenter: lower latency access, faster access, less delay and lower power. As a result, HBM design activities have picked up pace in these market segments.”

HBM design challenges

Perhaps not surprisingly, there are a number of HBM design factors that must be taken into consideration. For example, the challenge from a system design perspective often revolves around fitting more bandwidth in a reasonable area on the chip – and within a reasonable power profile. In general, HBM is considered quite efficient from a power and area perspective because it exploits 3D stacking technology.

“[However], the tradeoff here is cost, as HBM 2 is more expensive. So far, the primary applications of this technology are tied to some form of advanced packaging—2.5D or high-end fan-outs—which have been developed with high-performance in mind rather than cost,” Mutschler elaborates. “The alternative is some combination of DDRs and/or GDDRs, which can be combined to achieve more performance than a traditional DRAM solution. [However, this configuration would] require a larger area and more chips.”

Indeed, says Ferro, companies have to decide if they want to use 5 or 10 DDRs, multiple GDDRs or a single HBM stack.

“What are the system tradeoffs, power tradeoffs and performance tradeoffs? [These are important questions to consider], because SerDes has been driving up to 56 or 112 GB,” he tells the publication. “So, there are all of these very high-speed links in the system and now you’re moving data very rapidly, but now you have to start to store it and process it very rapidly, too. As a result, we continue to see in the networking market and in the enterprise market [engineering teams asking] how to get more memory bandwidth to go with all this moving of data around.”

HBM, GDDR6 and automotive memory

Ferro also notes that while power is always an important consideration, there is more of an emphasis on cost versus bandwidth in the automotive sector. Put simply, performance and price are neck and neck, with power coming in at third place.

“Chipmakers are looking at memory systems that can handle bandwidth greater than 100 Gbps and higher as you get into different levels of driver assisted cars – and ultimately self-driving cars,” Ferro tells Semiconductor Engineering. “In order to do that, the number of memory choices starts to narrow down quite a bit in terms of what can provide you with the necessary bandwidth to process all that data that is coming in.”

As Ferro points out, some of the early advanced driver-assistance systems (ADAS) system designs included both DDR4 and LPDDR4 because that was what was available at the time, although have their advantages and disadvantages.

“DDR4 is obviously the cheapest option available and those are in the highest-volume productions,” he adds. “They are certainly very cost-effective and very well understood. Doing error correction on DDR4 is simpler and well understood. LPDDR4 was also an option that was used, as well.”

Moving forward, says Ferro, the automotive industry can expect a range of memory types to coexist in different systems.

“If [systems] are heavily cost-driven, then they are going to be looking at something like DDR or maybe even LPDDR4,” he states. “But if they are heavily bandwidth-driven, [system designers] will be looking at something like HBM or GDDR. It’s really a function of where you are in your architecture stage.”

In addition, says Ferro, there are also various levels of ADAS capabilities to consider, as well as what’s required for the system and shipping timeframe.

“If you are getting a system shipping this year, it would have a different solution than systems being developed for next year or the year after that,” he continues. “Those are all the things that we are seeing on the continuum of time-to-market versus cost.”

On the high-performance side, says Ferro, the bandwidth-power tradeoff is the key challenge from a system-design standpoint. For example, how does an engineer get more bandwidth to fit in a reasonable area on a chip with reasonable power consumption?

“If you have an HBM [design], it is very efficient from a power and area standpoint because it uses 3D stacking technology, so from a power efficiency point of view, HBM is fantastic,” he concludes. “And from an area standpoint, one HBM stack takes up a relatively small amount of space, so that’s a really nice-looking solution from a power-performance perspective. You get great density, you get great power, low power, within a small area.”