跳转至

Network Load Balancing with In-network Reordering Support for RDMA

Remote Direct Memory Access (RDMA) is widely used in high-performance computing (HPC) and data center networks. In this paper, we first show that RDMA does not work well with existing load balancing algorithms because of its traffic flow characteristics and assumption of in-order packet delivery. We then propose ConWeave, a load balancing framework designed for RDMA. The key idea of ConWeave is that with the right design, it is possible to perform fine granularity rerouting and mask the effect of out-of-order packet arrivals transparently in the network datapath using a programmable switch. We have implemented ConWeave on a Tofino2 switch. Evaluations show that ConWeave can achieve up to 42.3% and 66.8% improvement for average and 99-percentile FCT, respectively compared to the state-of-the-art load balancing algorithms.

远程直接内存访问(RDMA)在高性能计算(HPC)和数据中心网络中被广泛应用。在本文中,我们首先展示了由于RDMA的流量特性以及其对包顺序到达的假设,RDMA与现有的负载均衡算法无法很好地协同工作。随后,我们提出了一种为RDMA设计的负载均衡框架——ConWeave。ConWeave的核心思想是,通过适当的设计,可以在网络数据路径中利用可编程交换机执行细粒度的重路由,并透明地屏蔽乱序包到达的影响。我们已在Tofino2交换机上实现了ConWeave。评估结果表明,与最新的负载均衡算法相比,ConWeave在平均流完成时间(FCT)和99百分位FCT上分别可实现高达42.3%和66.8%的提升。

Confusing Term

这里的“mask the effect of out-of-order packet arrivals transparently in the network datapath”的意思是,ConWeave的设计能够在网络数据路径中“掩盖”乱序到达的包所带来的影响。

具体来说,RDMA要求数据包按顺序到达,否则会导致性能下降甚至出现错误。然而,在进行负载均衡时,某些数据包可能会通过不同的路径到达目的地,从而导致乱序到达的问题。

ConWeave的创新在于,它利用可编程交换机在网络数据路径中做出一定的调整,避免因为乱序到达而影响RDMA的性能表现。并且这些调整对上层应用是“透明”的,即应用层感知不到这些调整的存在。

这样,ConWeave可以既实现负载均衡,又不因乱序到达而影响RDMA的正常运行。