The proliferation of connected devices has significantly increased the amount of data being captured, moved and analyzed. This trend is expected to continue well into the foreseeable future as the rapidly burgeoning Internet of Things (IoT) ramps up. Perhaps not surprisingly, the exponential increase in data has created a number of new bottlenecks in data centers, prompting the industry to examine fresh approaches to system architecture.
Currently, data centers aggregate numerous individual servers into a pool of processing units. Large, data-intensive tasks are distributed across multiple racks of servers. However, this one size fits all approach, typically characterized by a relatively fixed amount of compute, memory, storage and I/O resources in each server, frequently leads to an acute under-utilization of resources. This is because specific tasks may require a tailored amount of each compute resource in real-time. Simply put, the legacy server architecture contributes to low CPU utilization rates, high latencies to access data, reduced power efficiency and increased TCO.
According to Steven Woo, VP of Systems and Solutions at Rambus, two of the most important issues facing systems today are the impact of moving data over long distances to CPUs, and the inherent difficulty of optimizing the performance and power efficiency of data processing.
“This is why we launched our Smart Data Acceleration (SDA) Research Program. We want to address these and other issues by rethinking how systems should be architected in the future,” Woo told Rambus Press during a recent interview in Sunnyvale. “As part of this program, we’ve created the SDA engine – which pairs an FPGA with large capacities of DRAM.”
Essentially, says Woo, the FPGA provides flexible acceleration and offload capabilities, while the platform’s significant memory capacity enables low latency access to large amounts of data. Coupling the FPGA with high memory capacity minimizes data movement by bringing processing resources to the data, allowing applications to benefit effectively from near data processing.
As Woo confirms, the SDA program is currently focused on optimizing the performance and power efficiency of data-intensive workloads for servers and data centers.
“The HPC community – in particular – has identified a number of challenges related to accelerating performance and improving power efficiency. Initiatives to address these issues include the Exascale Computing Project, FastForward and DesignForward,” he said. “With a focus on dramatic improvements in these critical metrics, increasing emphasis is being placed on memory and storage hierarchies to optimize future systems for evolving workloads and tasks. Of course, such improvements will ultimately benefit standard data center workloads as well.”
Woo describes the FPGA and software architecture of the SDA engine as a flexible environment that allows engineers to experiment with near data processing while exploring the interaction between application software, drivers, firmware, FPGA bitfiles and memory. The software layer enables the SDA engine to present itself to the rest of the system in various configurations, including as an ultra-fast solid-state disk, a Key-Value store and a large pool of memory.
“This means the SDA engine can be used across a wide range of applications that require high memory capacity, including transaction processing, in-memory databases, financial services, real-time analytics and risk analysis, imaging and transcoding,” Woo continued. “The versatility of the SDA platform also facilitates a continuum of integration strategies that balance ease of integration with performance improvement.”
For example, says Woo, acting as an ultra-fast solid-state disk, the SDA engine can integrate with existing systems in a matter of minutes by simply loading a driver and mounting the device. Applications can also be modified to take full advantage of the acceleration and offload capabilities of the SDA engine to achieve higher performance gains.
“Testing of the SDA platform configured as an ultra-fast solid state disk confirms higher IOPS rates at much lower latencies – with significantly better latency under load – compared to state-of-the-art Enterprise NVMe SSDs,” Woo concluded. “Across a range of 4KB workloads, the SDA engine can deliver 1M IOPS at latencies of 30 μs. Coupled with PCIe-based switches, multiple SDA engines can work together to provide scalable performance in a compact form factor.”
Interested in learning more about our SDA platform? You can check out our research program page here.
Leave a Reply