Memory Systems for AI: Part 2

In part one of this series, we discussed how the world’s digital data is growing exponentially, doubling approximately every two years. In fact, there’s so much digital data in the world that artificial intelligence (AI) is practically the only way to begin to make sense of it all in a timely fashion. Insights gleaned from digital data are becoming more valuable, and one side effect is the need for greater security to protect the data, the AI models, and the infrastructure as well. Not surprisingly, the increasing value of the data and insights is causing AI developers to want to create more sophisticated algorithms, larger datasets, and new use cases and applications.

The challenge? Everyone wants more performance. However, the semiconductor industry can no longer fully rely on two important tools – Moore’s Law and Dennard (power) scaling – that have powered successive generations of silicon for the past several decades. Moore’s Law is slowing, while Dennard Scaling broke down around 2005. Nevertheless, the explosion of data and the advent of new AI applications are challenging the semiconductor industry to find new ways to provide better performance and better power efficiency.

At the same time, the architecture of the internet is also steadily evolving. We’re all familiar with the cloud-based model where we have endpoints like our phones capturing data. The data is collected and transmitted up into the cloud where servers process and interpret the data, sending some actions back to these devices that can tell us how to navigate and complete other tasks. Looking beyond the cloud, the upcoming rollout of 5G infrastructure will enable processing in a new location: the edge. The edge encompasses a range of locations including base stations and more geographically distributed processing locations that sit between cloud data centers and the endpoint. Edge locations offer the ability to process data closer to where it’s being created and provides promise for applications like autonomous driving that rely on low latency. So, in addition to being able to do some processing in the cloud on data captured at endpoints, we’re going to have another location, the edge, which will change the dynamic.

However, so much data is now being captured that it really can’t all be moved efficiently to the cloud. This is because the growth of the world’s digital data is increasing faster than the speed at which networks are improving. So, the question is: what do you do when you need to process all this data? The ability to process data at the edge provides a convenient way to take the growing amount of digital data, process it in part in locations closer to the endpoints, and then send a smaller amount of higher-value data to cloud data centers, easing the demand on network bandwidth. Processing data in this way offers the potential to improve performance as well as power efficiency.

Of course, the cloud will still be an important element of the data economy even as edge points become more popular with the advent of 5G. We expect the cloud will be used for some of the toughest interpretation and analysis tasks. In some cases, data captured by the endpoints will be sent to the edge where it can be partially processed before being sent to cloud data centers, in other cases the data will be sent directly to the cloud to be aggregated with other data captured across wide geographies.

We expect to see AI deployed across the evolving internet in data centers, endpoints, and at edge locations in-between, although the requirements and implementation will differ in each place. For example, we expect to see the highest performance solutions remaining in the cloud and being wall-plugged. In terms of memory requirements, these deployments will require the highest-performance, lowest-power solutions to support training as well as inferencing. The solutions of choice that we expect to see here are on-chip memory, as well as very high-performance discrete DRAM solutions like HBM and GDDR. Endpoints will still mostly be comprised of battery-powered devices. Much of what we see today – where such devices are typically used for inferencing – is going to be the same in the future. Low-power inferencing can be addressed by on-chip memories as well as low-power mobile memories.

It should also be noted that the definition of the edge has bifurcated and now comprises the near edge and the far edge. The near edge is closer to the cloud, while the far edge is closer to the endpoints. We expect to see a full range of memory solutions spanning the near and far edge. Specifically, at the near edge, closest to the cloud, we expect to see solutions and memory systems that look more like what you see in the cloud, such as on-chip memory, HBM and GDDR. At the far edge, we expect to see memory solutions that are similar to those deployed in endpoint devices, including on-chip memories, LPDDR, and DDR.

Memory Systems for AI: Part 2

Company

Products

Markets

Resources

About Steven Woo

Reader Interactions

Leave a Reply Cancel reply

Footer

Company

Products

Markets

Resources