Rick Merritt of the EE Times recently reported that Uber is using banks of Nvidia GPUs for deep learning applications. The transportation network company (TNC) is also testing FPGAs from multiple startups such as Eyeris, Graphcore and Wave Computing. In addition, Uber maintains a dedicated AI research team that oversees more than a dozen deep-learning models including recommendation engines for Uber Eats, fraud detection services and estimated driver arrival times.
“The algorithms span a half-dozen varieties, implemented across a laundry list of mainly open-source frameworks and libraries,” writes Merritt. “The underlying AI hardware today consumes as much as 40 kW in a rack of systems — twice the power that standard servers use — and can require flows of more than 100 petabytes of data.”
According to Merritt, Uber has begun designing its own compute and storage servers, as well as a variant of a 19-inch rack to house them along with vertically mounted switches. More specifically, Uber’s “super-hot” storage server utilizes two Intel Cascade Lake processors, with each 1U system packing more than 16 TB of NAND flash. Meanwhile, Uber’s warm storage system is built around Broadwell processors, with up to 70 8-TB SATA hard-disk drives in a 4U design that puts 6 petabytes in a rack.
“[Uber’s] systems have served more than 10 billion Uber rides to date at a rate of about 15 million a day across 600 cities in 65 countries,” he adds.
Commenting on the above, Steven Woo, Fellow and Distinguished Inventor in Rambus Labs, told Rambus Pressthat neural networks, machine learning and autonomous vehicle processors typically demand silicon with the highest memory bandwidths and power-efficiencies for inference and training. The latter, says Woo, can also require large memory capacities, especially for data center applications.
“Some companies use on-chip memory to power their silicon, while others deploy HBM2 or GDDR6,” he explains. “On-chip memory provides the highest bandwidths and power efficiencies but sacrifices capacity to achieve this.”
HBM2, says Woo, is being used in many high-end designs today, as the memory standard provides high bandwidth and is power efficient — although it comes with higher costs and is more complex to design with. As noted above, GDDR6 is also seeing interest, with the memory standard providing high bandwidth as well as high capacity. However, GDDR6 has higher power and challenging signal integrity constraints.
“As with other markets that use high-performance memories, the minimum expectation is to at least follow the historic trend of doubling memory bandwidth and improving power efficiency with each succeeding DRAM generation,” he adds.
Interested in learning more about AI and machine learning? You can check out our article archive on the subject here.
Leave a Reply