In this blog post, we take an in-depth look at Compute Express Link ™ (CXL) 2.0, an open standard cache-coherent interconnect between processors and accelerators, smart NICs, and memory devices.
- We explore how CXL is helping data centers more efficiently handle the yottabytes of data generated by artificial intelligence (AI) and machine learning (ML) applications.
- We discuss how CXL technology maintains memory coherency between the CPU memory space and memory on attached devices to enable resource sharing (or pooling).
- We also detail how CXL builds upon the physical and electrical interfaces of PCI Express® (PCIe) 5.0 with protocols that establish coherency, simplify the software stack, and maintain compatibility with existing standards.
- Lastly, we review Rambus CXL solutions, which include the Rambus CXL 2.0 Interconnect Subsystem comprising a CXL 2.0 Controller and CXL 2.0 SerDes PHY. These solutions are now available with integrated Integrity and Data Encryption (IDE) modules which monitor and protect against cyber and physical attacks on CXL and PCIe links.
Let’s get started.
- Industry Landscape: Why Is a New Class of Interconnect Needed?
- An Introduction to CXL: What Is Compute Express Link?
- CXL Protocols & Standards
- Compute Express Link vs PCIe 5: How Are These Two Related?
- CXL Features and Benefits
- CXL 2.0 Spec: What’s New?
- What Is the CXL Consortium?
- Rambus CXL Solutions
- Final Thoughts
1. Industry Landscape: Why is a new class of Interconnect needed?
Exponential data growth is prompting the semiconductor industry to embark on a groundbreaking architectural shift to fundamentally change the performance, efficiency, and cost of data centers.
Server architecture—which has remained largely unaltered for decades—is now taking a revolutionary step forward to address the yottabytes of data generated by AI/ML applications. Specifically, the data center is shifting from a model where each server has dedicated processing and memory—as well as networking devices and accelerators—to a disaggregated “pooling” paradigm that intelligently matches resources and workloads.
This approach offers a wide range of benefits for data centers including higher performance, increased efficiency, and lower total cost of ownership (TCO). Although the concept of disaggregation (or rack-scale architectures) and universal interfaces have circulated for some time, the industry is decisively converging on Compute Express Link (CXL) as a cache-coherent interconnect for processors, memory, and accelerators. Indeed, new server architectures and designs with CXL interfaces are soon to reach the market.
2. An Introduction to CXL: What is Compute Express Link?
Compute Express Link (CXL) is an open standard industry-supported cache-coherent interconnect for processors, memory expansion, and accelerators. Essentially, CXL technology maintains memory coherency between the CPU memory space and memory on attached devices. This enables resource sharing (or pooling) for higher performance, reduces software stack complexity, and lowers overall system cost. The CXL Consortium has identified three primary classes of devices that benefit from the new interconnect:
- Type 1 Devices: Accelerators such as smart NICs typically lack local memory. However, they can leverage the CXL.io protocol and CXL.cache to communicate with the host processor’s DDR memory.
- Type 2 Devices: GPUs, ASICs, and FPGAs are all equipped with DDR or HBM memory and can use the CXL.memory protocol, along with the CXL.io and CXL.cache, to make the host processor’s memory locally available to the accelerator—and the accelerator’s memory locally available to the CPU. They are also co-located in the same cache coherent domain and help boost heterogeneous workloads.
- Type 3 Devices: The CXL.io and CXL.memory protocols can be leveraged for memory expansion and pooling. For example, a buffer attached to the CXL bus could be used to enable DRAM capacity expansion, augmenting memory bandwidth, or adding persistent memory without the loss of DRAM slots. In real world terms, this means the high-speed, low-latency storage devices that would have previously displaced DRAM can instead complement it with CXL-enabled devices. These could include non-volatile technologies in various form factors such as add-in cards, U.2, and EDSFF.
3. CXL Protocols & Standards
The Compute Express Link (CXL) standard supports a variety of use cases via three protocols: CXL.io, CXL.cache, and CXL.memory.
- CXL.io: This protocol is functionally equivalent to the PCIe 5.0 protocol—and utilizes the broad industry adoption and familiarity of PCIe. As the foundational communication protocol, CXL.io is versatile and addresses a wide range of use cases.
- CXL.cache: This protocol, which is designed for more specific applications, enables accelerators to efficiently access and cache host memory for optimized performance.
- CXL.memory: This protocol enables a host, such as a processor, to access device-attached memory using load/store commands.
Together, these three protocols facilitate the coherent sharing of memory resources between computing devices, e.g., a CPU host and an AI accelerator. Essentially, this simplifies programming by enabling communication through shared memory.
4. Compute Express Link vs PCIe 5: How Are These Two Related?
CXL 2.0 builds upon the physical and electrical interfaces of PCIe 5.0 with protocols that establish coherency, simplify the software stack, and maintain compatibility with existing standards. Specifically, CXL leverages a PCIe 5 feature that allows alternate protocols to use the physical PCIe layer. When a CXL-enabled accelerator is plugged into a x16 slot, the device negotiates with the host processor’s port at default PCI Express 1.0 transfer rates (2.5 GT/s). Compute Express Link transaction protocols are activated only if both sides support CXL. Otherwise, they operate as PCIe devices.
According to Chris Angelini of VentureBeat, the alignment of CXL and PCIe 5 means both device classes can transfer data at 32 GT/s (giga transfers per second), or up to 64 GB/s in each direction over a 16-lane link. Angelini also notes that the performance demands of CXL are likely to be a driver for the adoption of PCIe 6.0.
5. CXL Features and Benefits
Streamlining and improving low-latency connectivity and memory coherency significantly bolsters computing performance and efficiency while lowering TCO. Moreover, CXL memory expansion capabilities enable additional capacity and bandwidth above and beyond the tightly bound DIMM slots in today’s servers. CXL makes it possible to add more memory to a CPU host processor through a CXL-attached device. When paired with persistent memory, the low-latency CXL link allows the CPU host to use this additional memory in conjunction with DRAM memory. The performance of high-capacity workloads depends on large memory capacities such as AI. Considering that these are the types of workloads most businesses and data-center operators are investing in, the advantages of CXL are clear.
6. CXL 2.0 Spec: What’s New?
CXL 2.0 supports switching to enable memory pooling. With a CXL 2.0 switch, a host can access one or more devices from the pool. Although the hosts must be CXL 2.0-enabled to leverage this capability, the memory devices can be a mix of CXL 1.0, 1.1, and 2.0-enabled hardware. At 1.0/1.1, a device is limited to behaving as a single logical device accessible by only one host at a time. However, a 2.0 level device can be partitioned as multiple logical devices, allowing up to 16 hosts to simultaneously access different portions of the memory.
As an example, a host 1 (H1) can use half the memory in device 1 (D1) and a quarter of the memory in device 2 (D2) to finely match the memory requirements of its workload to the available capacity in the memory pool. The remaining capacity in devices D1 and D2 can be used by one or more of the other hosts up to a maximum of 16. Devices D3 and D4, CXL 1.0 and 1.1-enabled respectively, can be used by only one host at a time.
By moving to a CXL 2.0 direct-connect architecture, data centers can achieve the performance benefits of main memory expansion—and the efficiency and total cost of ownership (TCO) benefits of pooled memory. Assuming all hosts and devices are CXL 2.0-enabled, “switching” is incorporated into the memory devices via a crossbar in the CXL memory pooling chip. This keeps latency low but requires a more powerful chip since it is now responsible for the control plane functionality performed by the switch. With low-latency direct connections, attached memory devices can employ DDR DRAM to provide expansion of host main memory. This can be done on a very flexible basis, as a host is able to access all—or portions of—the capacity of as many devices as needed to tackle a specific workload.
The “As Needed” Memory Paradigm
Analogous to ridesharing, CXL 2.0 allocates memory to hosts on an “as needed” basis, thereby delivering greater utilization and efficiency of memory. This architecture provides the option to provision server main memory for nominal workloads (rather than worst case), with the ability to access the pool when needed for high-capacity workloads and offering further benefits for TCO. Ultimately, the CXL memory pooling models can support the fundamental shift to server disaggregation and composability. In this paradigm, discrete units of compute, memory and storage can be composed on-demand to efficiently meet the needs of any workload.
Integrity and Data Encryption (IDE)
Disaggregation—or separating the components of server architectures—increases the attack surface. This is precisely why CXL includes a secure by design approach. Specifically, all three CXL protocols are secured via Integrity and Data Encryption (IDE) which provides confidentiality, integrity, and replay protection. IDE is implemented in hardware-level secure protocol engines instantiated in the CXL host and device chips to meet the high-speed data rate requirements of CXL without introducing additional latency. It should be noted that CXL chips and systems themselves require safeguards against tampering and cyberattack. A hardware root of trust implemented in the CXL chips can provide this basis for security and support requirements for secure boot and secure firmware download.
7. What Is the CXL Consortium?
The CXL Consortium is an open industry standard group formed to develop technical specifications that facilitate breakthrough performance for emerging usage models while supporting an open ecosystem for data center accelerators and other high-speed enhancements.
8. Rambus CXL Solutions
Rambus CXL 2.0 Controller
The Rambus CXL 2.0 Controller leverages a silicon-proven PCIe 5.0 controller architecture for the CXL.io path, and adds CXL.cache and CXL.mem paths specific to the CXL standard. The controller exposes a native Tx/Rx user interface for CXL.io traffic as well as an Intel CXL-cache/mem Protocol Interface (CPI) for CXL.mem and CXL.cache traffic. There is also an CXL 2.0 Controller with AXI version for ASIC and FPGA implementations with support for the AMBA AXI protocol specification for CXL.io and either CPI or AXI for CXL.mem, and CPI for CXL.cache or the AMBA CXS-B protocol specification. With the Rambus CXL 2.0 SerDes PHY, it comprises a complete CXL 2.0 interconnect subsystem.
Rambus CXL 2.0 SerDes PHY
The Rambus PCIe 5.0 and CXL 2.0 PHY is a low-power, area-optimized, silicon IP core designed with a system-oriented approach to maximize flexibility and ease of integration. It delivers up to 32 GT/s signaling rates in performance-intensive applications for AI, data center, edge, 5G infrastructure, and graphics. With the Rambus Rambus PCIe 5.0 Controller, it comprises a complete PCIe 5.0 SerDes subsystem. Alternatively, it is integrated with the Rambus CXL 2.0 controller core for a complete CXL 2.0 interconnect subsystem.
The Rambus CXL 2.0 and PCIe 5.0 controllers are available with integrated Integrity and Data Encryption (IDE) modules. IDE monitors and protects against physical attacks on CXL and PCIe links. CXL requires extremely low latency to enable load-store memory architectures and cache-coherent links for its targeted use cases. This breakthrough controller with a zero-latency IDE delivers state-of-the-art security and performance at full 32 GT/s speed.
The built-in IDE modules, now available in Rambus CXL 2.0 and PCIe 5.0 Controllers, employ a 256-bit AES-GCM (Advanced Encryption Standard, Galois/Counter Mode) symmetric-key cryptographic block cipher, helping chip designers and security architects to ensure confidentiality, integrity, and replay protection for traffic that travels over CXL and PCIe links. This secure functionality is especially imperative for data center computing applications including AI/ML and high-performance computing (HPC).
Key features include:
- IDE security with zero latency for CXL.mem and CXL.cache
- Robust protection from physical security attacks, minimizing the safety, financial, and brand reputation risks of a security breach
- IDE modules pre-integrated in Rambus CXL 2.0 and PCIe 5.0 controllers reduce implementation risks and speed time-to-market
- Complete CXL 2.0 and PCIe 5.0 interconnect subsystems when controllers are combined with Rambus CXL 2.0 and PCIe 5.0 PHYs
Server architecture—which has remained largely unaltered for decades—is taking a revolutionary step forward to address the enormous volume of data generated by AI/ML applications. Specifically, the data center is shifting from a model where each server has dedicated processing and memory—as well as networking devices and accelerators—to a disaggregated pooling paradigm that intelligently matches resources and workloads. This approach offers a wide range of benefits for data centers including higher performance, increased efficiency, and lower total cost of ownership (TCO).
Although the concept of disaggregation (or rack-scale architectures) and universal interfaces have circulated for some time, the industry is decisively converging on Compute Express Link (CXL) as a cache-coherent interconnect for processors, memory, and accelerators. CXL builds upon the physical and electrical interfaces of PCIe 5.0 with protocols that establish coherency, simplify the software stack, and maintain compatibility with existing standards. In addition to resource sharing (memory pooling) and switching, CXL makes it possible to add more memory to a CPU host processor via a CXL-attached device. When paired with persistent memory, the low-latency CXL link allows the CPU host to use this additional memory in conjunction with DRAM memory.
The advantages of CXL are clear, as the performance of high-capacity workloads depends on large memory capacities such as AI. This is why Rambus recently announced its CXL Memory Interconnect Initiative—to research and develop solutions that enable a new era of data center performance and efficiency. Current Rambus CXL solutions include the Rambus CXL 2.0 Controller and CXL 2.0 SerDes PHY. Integrated Integrity and Data Encryption (IDE) modules monitor and protect against cyber and physical attacks on CXL and PCIe links.
Explore more primers:
– Hardware root of trust: All you need to know
– PCI Express 5 vs. 4: What’s New?
– Side-channel attacks: explained
– DDR5 vs DDR4 – All the Design Challenges & Advantages
– MACsec Explained: From A to Z
– The Ultimate Guide to HBM2E Implementation & Selection