Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have introduced a programming language extension known as Milk that allows app developers to manage memory more efficiently in programs with scattered data points in large data sets.
To be sure, fetching data from memory banks is currently a major performance bottleneck, with cores grabbing entire blocks of data at a time based on the principle of locality. This approach results in slow program execution for many modern workloads, especially those with frequent random, indirect memory accesses such as graph analytics, key-value stores and machine learning.
According to MIT Press, programs written using these new language extensions were four times as fast as those coded without them – although the researchers believe additional progress with Milk will yield even more significant gains.
As Saman Amarasinghe, an MIT professor of Electrical Engineering and Computer Science, explains, big data sets pose problems for existing memory management techniques because they are sparse. Put simply, the scale of the solution does not necessarily increase proportionally with the scale of the problem.
“In social settings, we used to look at smaller problems,” Amarasinghe told the publication. “If you look at the people in this [CSAIL] building, we’re all connected. But if you look at the planet scale, I don’t scale my number of friends. The planet has billions of people, but I still have only hundreds of friends. Suddenly you have a very sparse problem.”
Vladimir Kiriansky, a PhD student in electrical engineering and computer science and first author on the paper introducing Milk, expressed similar sentiments.
“It’s as if, every time you want a spoonful of cereal, you open the fridge, open the milk carton, pour a spoonful of milk, close the carton and put it back in the fridge,” he said.
With Milk, chip cores refrain from grabbing entire blocks of data a time. Instead, Milk adds a data item’s address to a list of locally stored addresses. Ultimately, the cores ‘pool’ their respective lists, allowing group addresses in close proximity to be redistributed. This allows each core to request only required data items that that can be retrieved efficiently.
Commenting on the introduction of Milk, Steven Woo, VP of Systems and Solutions at Rambus, told us that modern applications such as in-memory databases, analytics and machine learning are increasingly being bottlenecked by accesses to the memory system.
“Limitations in delivered bandwidth, latency and capacity are causing CPUs to be heavily underutilized, driving the need for change in how applications interact with hardware,” he explained. “Making more efficient use of memory resources and minimizing data movement throughout the memory hierarchy, are critical issues that must be addressed in order to improve the performance and power efficiency of emerging workloads. Approaches like Milk demonstrate the potential benefits to end users of solving these key issues.”