Industry intelligence.  Get Smarter.

Credit: Ruslan Kudrin / Alamy Stock Photo)

DPUs (data processing units) are gaining favor as they are extensively used to accelerate AI and ML workloads by offloading tasks such as neural network inference and training from CPUs and GPUs.

Change is a constant in the technology industry. The newest entity in town that is revamping data centers is the data processing unit (DPU).

Why? The DPU is at the core of a rearchitect of processing power where servers have expanded well beyond a central processing unit (CPU) to a series of specialty processors, each offloading a specific set of tasks so CPU can fly.

By offloading lifeblood data handling functions from central processor units (CPU), DPUs are driving a data center makeover that can cut the amount of electricity used for cooling by 30%, reducing the number of expensive servers needed while boosting performance.

Unraveling the Magic of DPUs

DPUs are devices that give data center operators the ability to revamp operations and realize large resulting benefits in reduced energy costs and server consolidation while boosting server performance. The DPUs help data center servers handle and enhance new and emerging workloads.

Today, with far more distributed workloads and applications are more distributed, they are composed of unstructured data such as text, images, and large files. They also use microservices that increase east-west workload traffic across the data center, edge, and cloud and require near real-time performance. All this requires more data handling by infrastructure services without the expense of taking computing resources away from their crucial goal of supporting daily business applications.

Related:The Rise of DPUs: Revolutionizing App Performance and Delivery

What is a DPU?

The DPU is a relatively new device that offloads processing-intensive tasks from the CPU onto a separate card in the server. This mini onboard server is highly optimized for network, storage and management tasks. Why the DPU? Because the general CPU was not designed for these types of intensive data center workloads, running more of them on the server can weight it down, which reduces performance.

The use of DPUs can, for the above-mentioned reasons, make a data center far more efficient and less expensive to operate, all while boosting performance.

How does a DPU differ from CPUs and GPUs?

In the evolution of server computing power, the CPU came first, followed by the graphics processing unit (GPU), which handles graphics, images, and video while supporting gaming. DPUs can work with their predecessors to take on more modern data workloads. DPUs have risen in popularity by offloading data processing tasks such as AI, IoT, 5G, and machine learning.

dpus2

(Credit: Dzmitry Skazau / Alamy Stock Photo)

Critical Elements that Complement DPUs to Power Your Workloads

There are a series of elements that can effectively and efficiently help your DPUs create a team designed to handle your ever-changing and more demanding data center workloads. Working as one, the processers can help you supercharge your information processing efforts. They are:

GPU (Graphics Processing Unit)

GPUs complement the DPUs in a server by focusing on processing high bandwidth images and video, thus offloading this demanding function from CPUs. This addition to the processor architecture frees the new entrant to tackle more data and using less resources. GPUs are common in gaming systems.

CPUs

A CPU consists of a couple of powerful processing cores that are optimized for serial or sequential processing. That means handling one task after yet another. By contrast, GPUs have numerous simpler cores for parallel processing to handle simultaneous tasks. DPUs combine processing core, hardware, and accelerators, as well as a high-performance network interface with which to handle data-centric tasks in volume.

High-Performance Storage

Another element in your data center that complements the use of DPUs is high performance storage. Since DPUs facilitate improved network traffic management, boost security measures, and enhance storage processing the resulting heightened efficiency typically leads to an overall boost in systemwide performance.

"Storage, along with capable high-performance networking, completes the computing support infrastructure and is important during initial scoping to ensure maximum efficiency of all components," according to Sven Oehme. CTO at DDN Storage.

High-speed Network Connectivity

Generally, high-speed network connectivity complements DPUs by letting them take on your heaviest workloads, such as AI. Those applications also demand high-speed I/O. Therefore, most DPUs are configured with 100 Gbps ports nowadays and, in some cases, up to 400 Gbps. Faster supported speeds are expected soon.

Compute Express Link (CXL)

Compute Express LINK (CXL) provides an important assist in data center performance as it is an open interconnect standard for enabling efficient, coherent memory access between a host, such as a processor, and a device, such as hardware accelerator or SmartNIC, as was explained in "CXL: A New Memory High-Speed Interconnect Fabric".

The standard aims to tackle what is known as the von Neumann bottleneck in which computer speed is limited to the rate at which the CPU can retrieve instructions and data from the memory's storage. CXL solves this problem in several ways, according to the article. It takes a new approach to memory access and sharing between multiple computing nodes. It allows memory accelerators to become disaggregated, enabling data centers to be fully software-defined.

Field Programmable Gate Array (FPGA)

FPGA can complement DPUs to help power your workloads. There are several DPU architectures, including those based on ARM SoCs, and there are those based on the FPGA architecture. Intel has been successful with its FPGA-based Smart NICs, or IPUs. “FGPAs offer some differences compared to ARM-based DPUs in terms of the software framework and development. But the drawback is that FPGA programming is generally more complex than that of ARM,” explained Baron Fung, Senior Research Director at Dell'Oro Group, a global research and analysis firm. That is why most FPGA-based Smart NICs are deployed by the hyperscalers and larger Tier 2 Clouds, he added.

IPU (Infrastructure Processing Units)

IPUs are hardware accelerators designed to offload compute-intensive infrastructure tasks like packet processing, traffic shaping, and virtual switching from CPUs as we wrote in What is an IPU (Infrastructure Processing Unit) and How Does it Work? An IPU, like a DPU and CXL, makes a new type of acceleration technology available in the data center.

While GPUs, FPGAs, ASICS, and other hardware accelerators offload computing tasks from CPUs, these devices and technologies focus on speeding up data handling, movement, and networking chores.

dpus2

(Credit: Aleksey Odintsov / Alamy Stock Photo)

Accelerating Performance in Data Centers with DPUs

The emerging DPU processor class has the potential to increase server performance for AI applications. It focuses on data processing through the network, delivering efficient data movement around the data center, and the offloading of network, security, and storage activities from a system’s CPUs.

DPUs combined with other function accelerators are power cutters, which translates into savings for your organization. About 30% of a server's processing power is dedicated to performing network and storage functions as well as accelerating other key activities, including encryption, storage virtualization, deduplication, and compression.

Storage, along with capable high-performance networking, completes the computing support infrastructure and is important during initial scoping to ensure maximum efficiency of all components.

Optimizing data center efficiency with NVIDIA BlueField DPUs

Using a DPU to offload and accelerate networking, security, storage, or other infrastructure functions and control-plane applications reduces server power consumption by up to 30%, claimed NVIDIA in a paper. "The amount of power savings increases as server load increases and can easily save $5.0 million in electricity costs for a large data center with 10,000 servers over the 3-year lifespan of the servers."

Achieving supercomputing performance in the cloud

You can achieve the goal of cloud-native supercomputing, which blends the power of high-performance computing with the security and ease of use of cloud computing services, according to NVIDIA. The vendor provides NVIDIA Cloud-Native Supercomputing platform that it claims leverages the NVIDIA BlueField data processing unit (DPU) architecture with high-speed, low-latency NVIDIA Quantum InfiniBand networking "to deliver bare-metal performance, user management and isolation, data protection, and on-demand high-performance computing (HPC) and AI services," according to the vendor.

Combined with NVIDIA Quantum InfiniBand switching, this architecture delivers optimal bare-metal performance while natively supporting multi-node tenant isolation. 

Creating power-efficient data centers with DPUs

DPUs, Infrastructure Processing Units (IPUs), and Computer Express Link (CXL) technologies, which offload switching and networking tasks from server CPUs, have the potential to significantly improve the data center power efficiency, as we noted in “How DPUs, IPUs, and CXL Can Improve Data Center Power Efficiency.” In fact, the National Renewable Energy Laboratory (NREL) believes that the use of such techniques and focus on power reduction can result in a 33 percent improvement in power efficiency.

Integration hurdles in AI infrastructure

There are yet other challenges in rolling out DPUs in your data centers should you choose to include AI in the environment. First, DPUs are not a prerequisite for AI infrastructure per se. In most cases, the same benefits of DPU apply to both AI and non-AI infrastructure, such as the benefits of managing multi-tenants and security, offloading the host CPU, load balance, etc. However, one unique case of DPUs for AI infrastructure is the use of DPUs for Ethernet-based back-end networks of GPU/AI server clusters. In the case of the NVIDIA platform, DPU is part of their Spectrum-X solution set, which enables Ethernet-based back-end AI networks.

In contrast, other vendors, such as Broadcom, use RDMA with their NICs to enable Ethernet-based back-end AI networks. “I think anytime you’re incorporating multiple pieces of processors in addition to the CPU (such GPUs and DPUs), there is additional cost and software optimization work that would be needed,” cautioned Fung.

Balancing GPU vs CPU utilization

It’s important for you to know that DPUs can also help improve the utilization of both CPUs and GPUs. DPUs can offload network and storage infrastructure-related services from the CPU, improving CPU utilization. “This may not directly affect GPU utilization. However, DPUs can improve the utilization of GPUs through multi-tenant support,” explained Fung. “For example, in a large AI compute cluster of thousands of GPUs, that cluster can be subdivided and shared for different users and applications in a secure and isolated manner.”

dpus2

(Credit: Federico Caputo / Alamy Stock Photo)

A Sneak-Peak into the Future of DPUs

It should come as little surprise that the DPU market is poised for healthy growth. The global DPU market is projected to reach $5.5 billion by 2031, growing at a CAGR of 26.9% from 2022 to 2031, according to Allied Analytics LLP.

DPUs are extensively used to accelerate AI and ML workloads by offloading tasks such as neural network inference and training from CPUs and GPUs. In AI applications, DPUs are crucial in processing large datasets and executing complex algorithms efficiently, enabling faster model training and inference, according to KBV Research. Industries such as healthcare, finance, retail, and autonomous vehicles utilize DPUs to power AI-driven solutions for tasks like image recognition, natural language processing, and predictive analytics.

Navigating the future trajectory of data processing units

Analysts project DPUs have a large growth opportunity, especially for these AI networks. In the future, hyperscalers will use DPUs extensively, as they do now. The question is whether the non-hyperscalers can take advantage of DPUs. For those markets, DPUs could be useful for advanced workloads such as AI based on the above reasons. Adoption of DPUs for non-hyperscalers traditional server applications may take more time, and the vendor ecosystem needs to address the three following items: (DPU adoptions for the hyperscale have been progressing because they have the 1) volume/scale, 2) internal software development capabilities, and 3) specialized server/rack infrastructure enable efficient and economical use of DPUs,)

Tracking developments in DPU technology environments

You can expect to see a continued evolution and expansion of specialty processors for servers to help data centers operate more efficiently, less expensively, and with less power than their predecessors. Overloaded server CPUs are giving way to the GPU, the DPU, and, most recently, the IPU. Intel has championed the IPU to offload infrastructure services such as security, storage and virtual switching. This frees up CPU cores for better application performance and reduced power consumption.

Moving Forward with Emerging Data Center Technologies

Typically delivered in programmable and pluggable cards, or "units," a growing family of devices can be plugged into servers to offload CPU intensive tasks, potentially cutting cooling costs, reducing server headcount and freeing up existing horsepower for lifeblood workloads.

With today's modern and evolving workloads, combined with spending limits and the need to save energy in data centers, can you afford not to get smart on this trend?