Last updated on: December 19, 2022 In this blog post, we take an in-depth look at Compute Express Link ™ (CXL™), an open standard cache-coherent interconnect between processors and accelerators, smart NICs, and memory devices.
- We explore how CXL is helping data centers more efficiently handle the yottabytes of data generated by artificial intelligence (AI) and machine learning (ML) applications.
- We discuss how CXL technology maintains memory coherency between the CPU memory space and memory on attached devices to enable resource sharing (or pooling).
- We also detail how CXL builds upon the physical and electrical interfaces of PCI Express® (PCIe®) with protocols that establish coherency, simplify the software stack, and maintain compatibility with existing standards.
- Lastly, we review Rambus CXL solutions, which include the Rambus CXL 3.0 PHY and the CXL 2.0 Interconnect Subsystem comprising a CXL 2.0 Controller and CXL 2.0 SerDes PHY. The latter is are now available with integrated Integrity and Data Encryption (IDE) modules which monitor and protect against cyber and physical attacks on CXL and PCIe links.
Let’s get started.
- Industry Landscape: Why Is a New Class of Interconnect Needed?
- An Introduction to CXL: What Is Compute Express Link?
- What Is the CXL Consortium?
- CXL Protocols & Standards
- Compute Express Link vs PCIe: How Are These Two Related?
- CXL Features and Benefits
- What’s New in CXL 2.0 and 3.0?
- Rambus CXL Solutions
- Final Thoughts
1. Industry Landscape: Why is a new class of Interconnect needed?
Exponential data growth is prompting the computing industry to embark on a groundbreaking architectural shift to fundamentally change the performance, efficiency, and cost of data centers.
To continue to advance performance, servers are moving increasingly to a heterogenous computing architecture with purpose-built accelerators offloading specialized workloads from CPUs. The memory cache coherency of CXL allows for sharing of memory resources between CPUs and accelerators.
Further, CXL enables the deployment of new memory tiers that can bridge the latency gap between main memory and SSD storage. These new memory tiers will add bandwidth, capacity, increased efficiency, and lower total cost of ownership (TCO). With these many benefits, the industry is decisively converging on CXL as a cache-coherent interconnect for processors, memory, and accelerators.
2. An Introduction to CXL: What is Compute Express Link?
CXL is an open standard industry-supported cache-coherent interconnect for processors, memory expansion, and accelerators. Essentially, CXL technology maintains memory coherency between the CPU memory space and memory on attached devices. This enables resource sharing (or pooling) for higher performance, reduces software stack complexity, and lowers overall system cost. The CXL Consortium has identified three primary classes of devices that will employ the new interconnect:
- Type 1 Devices: Accelerators such as smart NICs typically lack local memory. Via CXL, these devices can communicate with the host processor’s DDR memory.
- Type 2 Devices: GPUs, ASICs, and FPGAs are all equipped with DDR or HBM memory and can use CXL to make the host processor’s memory locally available to the accelerator—and the accelerator’s memory locally available to the CPU. They are also co-located in the same cache coherent domain and help boost heterogeneous workloads.
- Type 3 Devices: Memory devices can be attached via CXL to provide additional bandwidth and capacity to host processors. The type of memory is independent of the host’s main memory.
3. What Is the CXL Consortium?
The CXL Consortium is an open industry standard group formed to develop technical specifications that facilitate breakthrough performance for emerging usage models while supporting an open ecosystem for data center accelerators and other high-speed enhancements.
4. CXL Protocols & Standards
The CXL standard supports a variety of use cases via three protocols: CXL.io, CXL.cache, and CXL.memory.
- CXL.io: This protocol is functionally equivalent to the PCIe protocol—and utilizes the broad industry adoption and familiarity of PCIe. As the foundational communication protocol, CXL.io is versatile and addresses a wide range of use cases.
- CXL.cache: This protocol, which is designed for more specific applications, enables accelerators to efficiently access and cache host memory for optimized performance.
- CXL.memory: This protocol enables a host, such as a processor, to access device-attached memory using load/store commands.
Together, these three protocols facilitate the coherent sharing of memory resources between computing devices, e.g., a CPU host and an AI accelerator. Essentially, this simplifies programming by enabling communication through shared memory. The protocols used to interconnect devices and hosts are as follows:
- Type 1 Devices: CXL.io + CXL.cache
- Type 2 Devices: CXL.io + CXL.cache + CXL.memory
- Type 3 Devices: CXL.io + CXL.memory
5. Compute Express Link vs PCIe: How Are These Two Related?
CXL builds upon the physical and electrical interfaces of PCIe with protocols that establish coherency, simplify the software stack, and maintain compatibility with existing standards. Specifically, CXL leverages a PCIe 5 feature that allows alternate protocols to use the physical PCIe layer. When a CXL-enabled accelerator is plugged into a x16 slot, the device negotiates with the host processor’s port at default PCI Express 1.0 transfer rates of 2.5 gigatransfers per second (GT/s). CXL transaction protocols are activated only if both sides support CXL. Otherwise, they operate as PCIe devices.
CXL 1.1 and 2.0 use the PCIe 5.0 physical layer, allowing data transfers at 32 GT/s, or up to 64 gigabytes per second (GB/s) in each direction over a 16-lane link.
CXL 3.0 uses the PCIe 6.0 physical layer to scale data transfers to 64 GT/s supporting up to 128 GB/s bi-directional communication over a x16 link.
6. CXL Features and Benefits
Streamlining and improving low-latency connectivity and memory coherency significantly bolsters computing performance and efficiency while lowering TCO. Moreover, CXL memory expansion capabilities enable additional capacity and bandwidth above and beyond the direct-attach DIMM slots in today’s servers. CXL makes it possible to add more memory to a CPU host processor through a CXL-attached device. When paired with persistent memory, the low-latency CXL link allows the CPU host to use this additional memory in conjunction with DRAM memory. The performance of high-capacity workloads depends on large memory capacities such as AI. Considering that these are the types of workloads most businesses and data-center operators are investing in, the advantages of CXL are clear.
7. What’s New in CXL 2.0 and 3.0?
CXL 2.0 supports switching to enable memory pooling. With a CXL 2.0 switch, a host can access one or more devices from the pool. Although the hosts must be CXL 2.0-enabled to leverage this capability, the memory devices can be a mix of CXL 1.0, 1.1, and 2.0-enabled hardware. At 1.0/1.1, a device is limited to behaving as a single logical device accessible by only one host at a time. However, a 2.0 level device can be partitioned as multiple logical devices, allowing up to 16 hosts to simultaneously access different portions of the memory.
As an example, a host 1 (H1) can use half the memory in device 1 (D1) and a quarter of the memory in device 2 (D2) to finely match the memory requirements of its workload to the available capacity in the memory pool. The remaining capacity in devices D1 and D2 can be used by one or more of the other hosts up to a maximum of 16. Devices D3 and D4, CXL 1.0 and 1.1-enabled respectively, can be used by only one host at a time.
CXL 3.0 introduces peer-to-peer direct memory access and enhancements to memory pooling where multiple hosts can coherently share a memory space on a CXL 3.0 device. These features enable new use models and increased flexibility in data center architectures.
By moving to a CXL 2.0 direct-connect architecture, data centers can achieve the performance benefits of main memory expansion—and the efficiency and total cost of ownership (TCO) benefits of pooled memory. Assuming all hosts and devices are CXL 2.0-enabled, “switching” is incorporated into the memory devices via a crossbar in the CXL memory pooling chip. This keeps latency low but requires a more powerful chip since it is now responsible for the control plane functionality performed by the switch. With low-latency direct connections, attached memory devices can employ DDR DRAM to provide expansion of host main memory. This can be done on a very flexible basis, as a host is able to access all—or portions of—the capacity of as many devices as needed to tackle a specific workload.
CXL 3.0 introduces multi-tiered switching which enables the implementation of switch fabrics. CXL 2.0 enabled a single layer of switching. With CXL 3.0, switch fabrics are enabled, where switches can connect to other switches, vastly increasing the scaling possibilities.
The “As Needed” Memory Paradigm
Analogous to ridesharing, CXL 2.0 and 3.0 allocate memory to hosts on an “as needed” basis, thereby delivering greater utilization and efficiency of memory. This architecture provides the option to provision server main memory for nominal workloads (rather than worst case), with the ability to access the pool when needed for high-capacity workloads and offering further benefits for TCO. Ultimately, the CXL memory pooling models can support the fundamental shift to server disaggregation and composability. In this paradigm, discrete units of compute, memory and storage can be composed on-demand to efficiently meet the needs of any workload.
Integrity and Data Encryption (IDE)
Disaggregation—or separating the components of server architectures—increases the attack surface. This is precisely why CXL includes a secure by design approach. Specifically, all three CXL protocols are secured via Integrity and Data Encryption (IDE) which provides confidentiality, integrity, and replay protection. IDE is implemented in hardware-level secure protocol engines instantiated in the CXL host and device chips to meet the high-speed data rate requirements of CXL without introducing additional latency. It should be noted that CXL chips and systems themselves require safeguards against tampering and cyberattack. A hardware root of trust implemented in the CXL chips can provide this basis for security and support requirements for secure boot and secure firmware download.
Scaling Signaling to 64 GT/s
CXL 3.0 brings a step function increase in data rate of the standard. As mentioned earlier, CXL 1.1 and 2.0 use the PCIe 5.0 electricals for their physical layer: NRZ signaling at 32 GT/s. CXL 3.0 keeps that same philosophy of building on broadly adopted PCIe technology and extends it to the latest 6.0 version of the PCIe standard released in early 2022. That boosts CXL 3.0 data rates to 64 GT/s using PAM4 signaling. We cover the details of PAM4 signaling in PCIe 6 – All you need to know.
8. Rambus CXL Solutions
Rambus CXL 3.0 SerDes PHY
The Rambus PCIe 6.0 and CXL 3.0 PHY is a low-power, area-optimized, silicon IP core designed with a system-oriented approach to maximize flexibility and ease of integration. It delivers up to 64 GT/s signaling rates in performance-intensive applications for artificial intelligence (AI), data center, edge, networking, and HPC.
Rambus CXL 2.0 Controller
The Rambus CXL 2.0 Controller leverages a silicon-proven PCIe 5.0 controller architecture for the CXL.io path, and adds CXL.cache and CXL.mem paths specific to the CXL standard. The controller exposes a native Tx/Rx user interface for CXL.io traffic as well as an Intel CXL-cache/mem Protocol Interface (CPI) for CXL.mem and CXL.cache traffic. There is also an CXL 2.0 Controller with AXI version for ASIC and FPGA implementations with support for the AMBA AXI protocol specification for CXL.io and either CPI or AXI for CXL.mem, and CPI for CXL.cache or the AMBA CXS-B protocol specification. With the Rambus CXL 2.0 SerDes PHY, it comprises a complete CXL 2.0 interconnect subsystem.
Rambus CXL 2.0 SerDes PHY
The Rambus PCIe 5.0 and CXL 2.0 PHY is a low-power, area-optimized, silicon IP core designed with a system-oriented approach to maximize flexibility and ease of integration. It delivers up to 32 GT/s signaling rates in performance-intensive applications for AI, data center, edge, 5G infrastructure, and graphics. It integrates with the Rambus CXL 2.0 controller core for a complete CXL 2.0 interconnect subsystem.
The Rambus CXL 2.0 and PCIe 5.0 controllers are available with integrated Integrity and Data Encryption (IDE) modules. IDE monitors and protects against physical attacks on CXL and PCIe links. CXL requires extremely low latency to enable load-store memory architectures and cache-coherent links for its targeted use cases. This breakthrough controller with a zero-latency IDE delivers state-of-the-art security and performance at full 32 GT/s speed.
The built-in IDE modules, now available in Rambus CXL 2.0 and PCIe 5.0 Controllers, employ a 256-bit AES-GCM (Advanced Encryption Standard, Galois/Counter Mode) symmetric-key cryptographic block cipher, helping chip designers and security architects to ensure confidentiality, integrity, and replay protection for traffic that travels over CXL and PCIe links. This secure functionality is especially imperative for data center computing applications including AI/ML and high-performance computing (HPC).
Key features include:
- IDE security with zero latency for CXL.mem and CXL.cache
- Robust protection from physical security attacks, minimizing the safety, financial, and brand reputation risks of a security breach
- IDE modules pre-integrated in Rambus CXL 2.0 and PCIe 5.0 controllers reduce implementation risks and speed time-to-market
- Complete CXL 2.0 and PCIe 5.0 interconnect subsystems when controllers are combined with Rambus CXL 2.0 and PCIe 5.0 PHYs
CXL is a-once-in-a-decade technological force that will transform data center architectures. Supported by a who’s who of industry players including hyperscalers, system OEMs, platform and module makers, chip makers and IP providers, its rapid development is a reflection of the tremendous value it can deliver.
This is why Rambus launched the CXL Memory Interconnect Initiative—to research and develop solutions that enable a new era of data center performance and efficiency. Current Rambus CXL solutions include the Rambus CXL 3.0 and 2.0 SerDes PHYs and CXL 2.0 Controller with integrated IDE.
Explore more primers:
– Hardware root of trust: All you need to know
– PCI Express 5 vs. 4: What’s New?
– Side-channel attacks: explained
– DDR5 vs DDR4 – All the Design Challenges & Advantages
– MACsec Explained: From A to Z
– The Ultimate Guide to HBM2E Implementation & Selection