DChannel: Accelerating Mobile Applications with Parallel High-bandwidth and Low-latency Channels¶

这篇论文的核心思想是打破带宽和延迟的权衡, 通过并行使用 5G 的两种通道 (eMBB 和 URLLC) 来加速移动应用

(1) 核心背景与问题

痛点:

Web 浏览, 云游戏等交互式应用非常依赖低延迟, 但现有的移动网络标准往往只能在"高带宽"和"低延迟"之间二选一

5G 的现状:

eMBB (Enhanced Mobile Broadband, 增强型移动宽带): 带宽极大 (千兆级), 但延迟较高且不稳定 (尤其在移动时)
URLLC (Ultra-Reliable Low-Latency Communication, 超可靠低延迟通信): 延迟极低 (2-10ms), 但带宽非常小 (仅约 2Mbps), 无法承载通用流量

目标:

DChannel 旨在同时利用这两条通道, 既享受 eMBB 的高带宽, 又利用 URLLC 的低延迟

(2) DChannel 的系统设计

DChannel 是一个工作在网络层的精细化数据包导向 (Packet Steering) 方案, 不需要应用层提供信息

它主要由两部分组成:

智能分流器 (Packet Steerer):
- 策略: 它将 URLLC 视为昂贵的稀缺资源. 系统会计算每个数据包走 URLLC 的收益 (减少的完成时间) 与成本 (占用有限的带宽)
- 结果: 通常, DNS 查询, TCP SYN, ACK 以及小的请求数据会被分流到 URLLC; 而主要的数据传输则走 eMBB
重排序缓冲区 (Reordering Buffer):
- 问题: 由于 URLLC 比 eMBB 快很多, 数据包到达接收端时会严重乱序, 这会让 TCP 误以为发生了拥塞
- 解决: DChannel 在接收端 (手机/网关) 引入了一个缓冲区, 将乱序的包重新排好序再交给上层应用

Introduction¶

Low latency is critical to interactive applications such as web browsing, virtual and augmented reality, and cloud gaming. For web applications, even an increase of 100 ms latency can result in as much as 1% revenue loss, as noted by Amazon [21]. Emerging VR, AR, and cloud gaming applications also rely on low latency to deliver a seamless user experience. For instance, VR requires 20 ms or lower latency to avoid any simulator sickness [19].

Current mobile broadband, serving general Internet applications such as web browsing and video streaming, have not yet delivered consistent low latency performance, in part due to the inherent trade-off between latency and bandwidth [22]. One approach is to provide two separate channels (or services) – one optimizing for bandwidth, the other optimizing for latency – with different types of user applications assigned to them. 5G NR follows this pattern with its enhanced mobile broadband (eMBB) and ultra-reliable and low-latency communication (URLLC) channels. eMBB, which serves general-purpose Internet use, is heavily focused on delivering gigabit bandwidth. This channel will be useful for streaming media but offers little to no improvement for latency-sensitive applications, such as web browsing [34,35,50]. Experimentally, web page load time in existing 5G deployments, even in close-to-ideal circumstances (a stationary device and a channel with little utilization), is similar to 4G for pages smaller than 3 MB in size and about 19% faster than 4G for pages larger than 3 MB [34]. This is due to 5G eMBB having 28 ms or larger latency, broadly similar to 4G [34]. Our measurements of 5G mmWave showed similar results, at around 22 ms in ideal conditions.

Meanwhile, 5G URLLC promises an exciting capability of very low latency, in the range of 2 to 10 ms [6], but compromises severely on bandwidth, making it unsuitable for common mobile applications. Our experiments emulating web browsing (the most widely used mobile application [44], and far from the most bandwidth-intensive application) over URLLC with 2 Mbps bandwidth show web page load times would be 5.87× worse than with eMBB. Hence, neither using URLLC alone nor using eMBB alone provides good performance. As the latency-bandwidth trade-off is fundamental, this separation between a high bandwidth channel (HBC) and a low latency channel (LLC) is likely to persist; 6G, for example, is also expected to include both [54].

We believe, however, that the availability of two channels offers an opportunity to deal with the fundamental latencybandwidth tradeoff in a new way, beyond simple static assignment of an application to a single channel. Specifically, we argue that by using high bandwidth and low latency channels in parallel on mobile devices, significant performance and user experience improvements are possible for latencysensitive applications. Here, we explore this hypothesis for the case of web browsing and web-based mobile applications.

Mapping an application's traffic to HBC and LLC is difficult since we have to use LLC's bandwidth very selectively. Indeed, the main deployed transport-layer mechanism to combine multiple channels, MPTCP [49], assumes two interfaces that are each of significant bandwidth, with the goal of aggregating that bandwidth or supporting failover. LLC's bandwidth, however, is a rounding error compared to HBC's. Other works – particularly Socket Intents [42] and TAPS [38] – exploit multi-access connectivity through application-level input, which we prefer to avoid to ease deployment and expand relevance to other applications in the future; therefore we expect new mechanisms are necessary.

To solve these problems, we design DChannel, a system that leverages parallel channels to improve the performance of mobile applications. DChannel comprises two modules running at either end of the channels – namely, in the mobile device OS and in a gateway device operated by the service provider. Central to the approach is a packet steering scheme that operates at the network layer (i.e., IP packets) without requiring any application input. Such fine-grained, per-packet decisions (as opposed to, for example, HTTP object-level steering) are key to making effective use of the limited LLC bandwidth. To decide which packets are worth accelerating, since LLC bandwidth is extremely limited, DChannel treats the channel as an expensive resource and calculates the benefit and cost of utilizing the LLC for each packet. Finally, since the parallel channels could occasionally confuse the transport layer with out-of-order delivery, DChannel employs a reordering buffer in the mobile device and gateway.

To evaluate our design with a concrete scenario, we leverage 5G's eMBB and URLLC as our HBC and LLC. We evaluate the benefit of DChannel in our experimental testbed (§4). Our testbed includes a prototype that can capture and steer application traffic, and a high-fidelity trace-driven network emulator that emulates cellular network latency variability and delay caused by radio resource control (RRC) state transitions [41]. We gather two types of real 5G eMBB traces mmWave and lowband – in three different scenarios: stationary, low mobility, and high mobility. Our evaluations cover popular web applications such as web browsing and Android mobile applications. Using the testbed, we evaluate our packet steering scheme and compare it with prior approaches such as MPTCP [2] and ASAP [29]. We also evaluate DChannel in live 5G eMBB networks. Our key findings are as follows:

DChannel, which requires little per-connection state and no application knowledge, yields superior performance compared to the other evaluated schemes—object-level steering, static packet-size-based steering, as well as prior work, MPTCP and ASAP [29], which used multiple channels in other settings.
Compared with exclusively utilizing the eMBB, allocating a modest bandwidth of 2 Mbps to URLLC allows DChannel to improve web page load time (PLT). Under conditions that are ideal for eMBB (a stationary client with a line of sight to the base station and full signal strength), DChannel reduces PLT by 20% and 33% in 5G mmWave and low-band settings, respectively. Under more challenging mobile conditions, DChannel improves PLT by 37% and 42% in 5G mmWave and low-band, respectively.
In addition to web browsing, we evaluated three Android mobile apps in a live environment and find DChannel improves apps responsiveness by 16% on average.
Somewhat surprisingly, DChannel improves sustained throughput in our mobile 5G setting by roughly 10% – a useful side benefit of accelerating the TCP control loop in dynamic environments.

Finally, we discuss deployment strategies, challenges, and future opportunities. We believe our basic techniques can apply to a variety of latency-sensitive applications, and open new opportunities for app developers and cellular providers.

低延迟 (Low latency) 对于交互式应用至关重要, 例如网页浏览, 虚拟现实 (VR), 增强现实 (AR) 以及云游戏. 对于网络应用而言, 正如亚马逊所指出的, 即便是100毫秒的延迟增加也可能导致高达1%的营收损失. 新兴的VR, AR和云游戏应用同样依赖低延迟来提供无缝的用户体验. 例如, VR技术要求延迟控制在20毫秒或更低, 以避免产生晕动症 (simulator sickness).

服务于通用互联网应用 (如网页浏览和视频流媒体) 的当前移动宽带尚未能提供持续稳定的低延迟性能, 部分原因在于延迟与带宽之间固有的权衡 (trade-off) 关系.

一种解决思路是提供两个独立的信道 (或服务) --一个针对带宽优化, 另一个针对延迟优化--并将不同类型的用户应用分配给它们.

5G 新空口 (NR) 即遵循此模式, 提供了增强型移动宽带 (eMBB) 和超可靠低延迟通信 (URLLC) 信道. eMBB服务于通用互联网用途, 侧重于提供千兆级带宽. 该信道适用于流媒体, 但对网页浏览等延迟敏感型应用的性能提升微乎其微. 实验表明, 在现有的5G部署中, 即便是在接近理想的条件下 (设备静止且信道利用率低), 对于小于3MB的网页, 其页面加载时间与4G相近; 对于大于3MB的网页, 仅比4G快约19%. 这是因为5G eMBB的延迟仍达28毫秒或更高, 这与4G大致相当. 我们在理想条件下对5G毫米波 (mmWave) 的测量结果也类似, 约为22毫秒.

与此同时, 5G URLLC虽然承诺了极具吸引力的超低延迟能力 (2至10毫秒范围), 但其带宽受到严重限制, 无法满足通用移动应用的需求. 我们的模拟实验显示, 在2 Mbps带宽的URLLC上进行网页浏览 (这是最广泛使用的移动应用, 且远非带宽最密集的应用), 其页面加载时间比使用eMBB慢5.87倍. 因此, 单独使用URLLC或单独使用eMBB均无法提供良好的性能.

由于延迟与带宽的权衡是根本性的, 这种高带宽信道 (HBC) 与低延迟信道 (LLC) 的分离很可能会持续存在; 例如, 预计6G也将同时包含这两种信道.

然而, 我们认为双信道的存在提供了一个契机, 使我们能超越简单的静态应用分配, 以新的方式应对根本性的延迟-带宽权衡.

具体而言, 我们提出, 通过在移动设备上并行使用高带宽和低延迟信道, 可以显著提升延迟敏感型应用的性能和用户体验. 本文针对网页浏览和基于Web的移动应用这一场景探索了该假设.

将应用流量映射到HBC和LLC具有难度, 因为我们必须极具选择性地使用LLC的带宽. 事实上, 当前主要部署的传输层多信道结合机制MPTCP, 其假设前提是两个接口均具备可观的带宽, 目标在于聚合带宽或支持故障转移. 然而, 与HBC相比, LLC的带宽几乎可以忽略不计. 其他工作--特别是Socket Intents和TAPS--通过应用层输入来利用多路连接, 而为了简化部署并扩展至其他应用的适用性, 我们要避免这种方式; 因此, 我们需要新的机制.

为解决上述问题, 我们设计了DChannel, 一个利用并行信道提升移动应用性能的系统. DChannel包含两个模块, 分别运行在信道的两端: 即移动设备操作系统内和服务提供商运营的网关设备内.

该方法的核心是一个运行在网络层 (即IP数据包层) 的数据包导向 (packet steering) 方案, 且无需任何应用层输入.

这种细粒度的, 逐包 (per-packet) 的决策 (相对于HTTP对象级导向等) 是有效利用有限LLC带宽的关键. 鉴于LLC带宽极其有限, 为了决定哪些数据包值得加速, DChannel将该信道视为昂贵资源, 并计算每个数据包利用LLC的收益与成本.

最后, 由于并行信道偶尔可能导致乱序交付从而混淆传输层, DChannel在移动设备和网关处均采用了重排序缓冲区 (reordering buffer).

为了在具体场景中评估我们的设计, 我们利用5G的eMBB和URLLC分别作为HBC和LLC. 我们在实验测试床中评估了DChannel的收益. 该测试床包含一个可捕获并导向应用流量的原型, 以及一个高保真的轨迹驱动型网络仿真器, 该仿真器模拟了蜂窝网络的延迟变化以及由无线资源控制 (RRC) 状态转换引起的延迟. 我们收集了两种类型的真实5G eMBB轨迹--毫米波 (mmWave) 和低频段 (low-band)--涵盖静止, 低速移动和高速移动三种场景. 我们的评估涵盖了热门的Web应用 (如网页浏览) 和Android移动应用. 利用该测试床, 我们评估了数据包导向方案, 并将其与MPTCP和ASAP等先前方法进行了比较. 我们还在现网5G eMBB网络中对DChannel进行了评估. 我们的关键发现如下:

DChannel仅需极少的逐连接 (per-connection) 状态且无需应用层感知, 其性能优于所有参与评估的其他方案--包括对象级导向, 基于静态包大小的导向, 以及先前在其他设置中使用多信道的工作 (如MPTCP和ASAP)
相比仅使用eMBB, 分配仅2 Mbps的适度带宽给URLLC即可使DChannel显著改善网页加载时间 (PLT)
在eMBB的理想条件下 (客户端静止, 与基站视距内可见且信号满格), DChannel在5G毫米波和低频段设置下分别将PLT降低了20%和33%
在更具挑战性的移动条件下, DChannel在5G毫米波和低频段设置下分别将PLT提升了37%和42%
除网页浏览外, 我们在现网环境中评估了三款Android移动应用, 发现DChannel将应用的平均响应速度提升了16%
颇为令人惊讶的是, DChannel将我们移动5G环境下的持续吞吐量提升了约10%--这是通过在动态环境中加速TCP控制回路所带来的有益副作用

最后, 我们讨论了部署策略, 挑战及未来机遇. 我们相信, 本文的基础技术可应用于多种延迟敏感型应用, 并为应用开发者和蜂窝网络提供商开启新的机遇

Background and Motivation¶

2.1 Channels in 5G¶

5G wireless networks are designed to support applications with very different service level requirements. The 5G standard known as New Radio (NR) specifies three service models: (1) enhanced mobile broadband (eMBB) for standard high-data-rate Internet and mobile connectivity, (2) ultrareliable low-latency communication (URLLC) for missioncritical and latency-sensitive applications, and (3) massive machine-type communications (mMTC) for large-scale IoT deployments. We describe eMBB and URLLC in more depth.

(1) Enhanced Mobile Broadband: This service focuses on providing high-data-rate mobile access. It is considered an upgrade to 4G mobile broadband that will satisfy the everincreasing demand for mobile and wireless data. 5G eMBB can operate either at the low-frequency bands below 6 GHz which we refer to as low-band or the high-frequency bands around 28 GHz/39 GHz which we refer to as millimeter wave (mmWave). The mmWave bands are a key new technology in 5G as they offer 10× the bandwidth that is currently available to 4G LTE networks [4], enabling user throughput of around 1 Gbps [15].

Providers like Verizon, AT&T, and T-Mobile have already deployed both the low-band and mmWave 5G in several major US cities, including Chicago, Atlanta, New York, and Los Angeles [9–11,34]. A recent measurement study on commercial mmWave 5G networks in the US shows TCP throughput of up to 2 Gbps for download and 60 Mbps for upload, with a mean RTT of 28 ms measured between the client and the firsthop edge server right outside the cellular network core [34]. The measurements were performed, however, in conditions favorable to mmWave such as line-of-sight, no mobility, and few clients.

eMBB latency is expected to be higher as the number of users increases and as users move. This is because radio access networks (RANs) operating in the mmWave bands use very directional beams to compensate for high signal attenuation, making them vulnerable to blockage and mobility. High data rate communication is possible only when the RAN access point aligns its beam towards the user [27]. This process, commonly referred to as beam alignment, can introduce significant delays, especially when users are moving, which requires the access point to keep realigning the beam of each user [23,27]. Furthermore, the user or other obstacles can easily block the beam, leading to unreliable and inconsistent performance both in terms of changes in throughput and highly variable RTT [3,32,34]. Our own experiments in Chicago also confirm this and show that the RTT can vary significantly even for stationary clients and is further exacerbated while walking or driving. This is because 5G eMBB mainly optimizes for high data rates, focusing less on reliability and low latency.

(2) Ultra-Reliable Low-Latency Communication: Unlike eMBB, this channel focuses on providing highly reliable, very low latency communication at the cost of limited throughput. It aims to support mission-critical and emerging applications with stringent latency and reliability requirements such as self-driving cars, factory automation, and remote surgery. While the URLLC channel is yet to be deployed in practice, the standard specifies a target 0.5 ms air latency between the client and the RAN (1 ms RTT) with 99.999% reliability for small packets (e.g. 32 to 250 bytes) [15]. It also specifies a target end-to-end latency (from a client to a destination typically right outside the cellular network core) of 2 to 10 ms with throughput ranging between 0.4 to 16 Mbps depending on the underlying application [6]. URLLC is expected to operate in the sub-6 GHz frequency bands (e.g. 700 MHz or 4 GHz) and operators are expected to use network slicing to provide dedicated resources to URLLC clients in order to guarantee consistent performance in terms of latency and reliability across both the radio access network (RAN) and the cellular core [6]. Finally, client access to the URLLC channel will be controlled by the network operators. The access control network slicing mechanisms, however, are left to the operators' own implementations [8].

5G 无线网络旨在支持具有极大差异服务等级要求的应用.

5G 标准 (即新空口, NR) 定义了三种服务模型:

增强型移动宽带 (eMBB), 用于标准的高速率互联网和移动连接
超可靠低延迟通信 (URLLC), 用于关键任务和延迟敏感型应用
海量机器类通信 (mMTC), 用于大规模物联网部署

积累一下: 5G NR Service Models

eMBB: enhanced Mobile BroadBand
URLLC: Ultra-Reliable Low-Latency Communication
mMTC: massive Machine Type Communication

下文将深入阐述 eMBB 和 URLLC

(1) 增强型移动宽带 (eMBB):

该服务侧重于提供高数据速率的移动接入. 它被视为 4G 移动宽带的升级版, 旨在满足日益增长的移动和无线数据需求

5G eMBB 既可在 6 GHz 以下的低频段 (我们称之为"低频段") 运行, 也可在 28 GHz/39 GHz 左右的高频段 (我们称之为"毫米波") 运行. 毫米波频段是 5G 的一项关键新技术, 其带宽是当前 4G LTE 网络的 10 倍, 能够实现约 1 Gbps 的用户吞吐量.

Verizon, AT&T 和 T-Mobile 等运营商已在美国主要城市 (包括芝加哥, 亚特兰大, 纽约和洛杉矶) 部署了低频段和毫米波 5G. 近期一项针对美国商用毫米波 5G 网络的测量研究显示, 其下载 TCP 吞吐量高达 2 Gbps, 上传为 60 Mbps, 客户端与蜂窝网络核心网外的首跳边缘服务器之间的平均往返时间 (RTT) 为 28 ms. 然而, 这些测量是在有利于毫米波的条件下进行的, 如视距 (Line-of-Sight), 无移动性且用户较少.

随着用户数量增加和用户移动, 预计 eMBB 的延迟会升高. 这是因为在毫米波频段运行的无线接入网 (RAN) 使用极具方向性的波束来补偿高信号衰减, 使其易受遮挡和移动性的影响.

只有当 RAN 接入点将其波束对准用户时, 才可能实现高数据速率通信. 这一过程通常称为波束对准, 可能会引入显著的延迟, 尤其是在用户移动时, 因为这需要接入点不断重新调整针对每个用户的波束

此外, 用户或其他障碍物很容易阻挡波束, 导致性能不可靠且不稳定, 表现为吞吐量的变化和 RTT 的剧烈波动

我们在芝加哥进行的实验也证实了这一点, 表明即使对于静止的客户端, RTT 也可能发生显著变化, 而在步行或驾驶时情况会进一步恶化. 这是因为 5G eMBB 主要针对高数据速率进行优化, 较少关注可靠性和低延迟.

(2) 超可靠低延迟通信 (URLLC):

与 eMBB 不同, 该信道侧重于提供高可靠, 极低延迟的通信, 但以牺牲吞吐量为代价. 旨在支持具有严格延迟和可靠性要求的关键任务及新兴应用, 如自动驾驶汽车, 工厂自动化和远程手术.

尽管 URLLC 信道尚未投入实际部署, 但标准规定了针对小数据包 (如 32 至 250 字节) 的目标: 客户端与 RAN 之间的空口延迟为 0.5 ms (RTT 为 1 ms), 可靠性为 99.999%.

标准还规定了 2 至 10 ms 的目标端到端延迟 (从客户端到通常位于蜂窝网络核心网外的目的地), 吞吐量范围为 0.4 至 16 Mbps, 具体取决于底层应用.

预计 URLLC 将在 6 GHz 以下频段 (如 700 MHz 或 4 GHz) 运行, 运营商预计将使用网络切片技术为 URLLC 客户端提供专用资源, 以保证无线接入网 (RAN) 和蜂窝核心网在延迟和可靠性方面的一致性能. 最后, 客户端对 URLLC 信道的访问将由网络运营商控制.

然而, 访问控制和网络切片机制的具体实现留给了运营商自行决定.

2.2 Web browsing traffic¶

While we evaluate several applications, web browsing is the major focus of this work and serves as a running example.

A single web page may contain tens to hundreds of relatively small-sized web objects distributed across multiple servers and domains. Consequently, web browsing traffic is characterized by its often short and bursty flows. A study across Alexa Top 200 pages found that the median number of objects in a page is 30, while the median object size is 17 KB [48]. Fetching these web objects translates to many HTTP request-and-response interactions across many short flows. The browser fires a page load event when it finishes rendering a page, which is used to determine Page Load Time (PLT), a performance metric for web browsing. Although PLT has some shortcomings, the alternatives are not free from issues, and PLT is most widely used. PLT is typically dominated by DNS lookup, connection establishment, and TCP convergence time—which require little throughput but are highly dependent on RTT. Prior work also showed that increasing TCP throughput beyond ≈ 16 Mbps offers little improvement in PLT [45].

Of course, web page loading is affected by client CPU and server delay, in addition to network delay. Prior work found that 35% of the PLT is spent in client-side computations [47]. But the above characteristics, combined with the fact that mobile CPUs have been getting increasingly powerful [26], still suggest that network latency plays an important part in mobile web performance. Moreover, a significant portion of network latency lies in the "last mile" connection of the cellular network. Many other mobile apps also rely on HTTP-based interaction with cloud services, resulting in similar network performance requirements.

虽然我们评估了多种应用, 但网页浏览是本工作的重点, 并作为一个贯穿全文的示例.

单个网页可能包含分布在多个服务器和域中的数十到数百个相对较小的 Web 对象. 因此, 网页浏览流量的特征通常是短而突发的流. 一项针对 Alexa 排名前 200 网页的研究发现, 页面中对象的数量中位数为 30, 而对象大小的中位数为 17 KB. 获取这些 Web 对象转化为许多短流上的大量 HTTP 请求-响应交互.

浏览器在完成页面渲染时会触发页面加载事件, 该事件用于确定页面加载时间 (PLT), 这是衡量网页浏览性能的指标

尽管 PLT 存在一些缺陷, 但替代指标也并非没有问题, 且 PLT 的使用最为广泛. PLT 通常由 DNS 查询, 连接建立和 TCP 收敛时间主导--这些过程对吞吐量要求不高, 但高度依赖于 RTT.

先前的工作还表明, 将 TCP 吞吐量提高到约 16 Mbps 以上对 PLT 的改善微乎其微.

当然, 除了网络延迟外, 网页加载还受客户端 CPU 和服务器延迟的影响. 先前的工作发现, 35% 的 PLT 用于客户端计算. 但上述特征, 结合移动 CPU 性能日益强大的事实, 仍然表明网络延迟在移动 Web 性能中起着重要作用. 此外, 网络延迟的很大一部分在于蜂窝网络的"最后一英里"连接. 许多其他移动应用也依赖于与云服务的基于 HTTP 的交互, 从而产生了类似的网络性能需求.

DChannel Design¶

3.1 High-Level Architecture¶

To steer application traffic in both uplink and downlink channels, there will be two main components, one in the mobile client device and one in the mobile core network (Figure 1).

On the client, applications interact with the network through a network interface as usual. In our prototype, this is a special virtual TUN interface designated for traffic that should utilize both the HBC and LLC. The client-side agent captures outgoing packets on this interface and implements an algorithm to steer traffic between the two channels. The agent also captures incoming traffic on both channels and merges it into the virtual interface, after buffering it as needed to reorder packets (§3.6).

The proxy-side agent performs symmetric functions using the same algorithms – steering traffic headed towards the client, and merging and reordering traffic outbound to the Internet. This agent runs in the service provider’s network, on a gateway at the point where the separate HBC and LLC channels begin. The exact location of the proxy-side agent may depend on the service provider’s internal architectural choices; note that it is not necessarily located at the RAN base station, because the LLC’s latency optimizations may extend into the packet core (e.g., for prioritized queuing and routing) [5].

The next subsections detail how we design the steering component, in several steps, as it is the more complex component. After that, we describe the reordering buffer.

系统架构 (High-Level Architecture)

alt text

系统包含两个主要组件:

移动客户端代理
服务提供商网络中的网关代理

两者通过虚拟网络接口（TUN）透明地捕获进出流量，并在高带宽信道（HBC）和低延迟信道（LLC）之间进行双向流量导向

3.2 Steering Granularity¶

To build the packet steering module, we begin with the question of the granularity, and corresponding layer, at which steering should occur. We considered splitting at two different layers: the application layer and the network layer. Application-layer splitting refers to steering application requests and responses to the appropriate channels. In the context of web browsing, this approach translates to requesting and delivering web objects (in the form of HTTP requests) on either LLC or HBC. Application-layer splitting is broadly similar to Socket Intents [42].

Object-level splitting may benefit from application-level knowledge about web objects, which vary in size and priority. Since LLC is bandwidth constrained, LLC can only deliver small objects faster than HBC. 1 Web pages have complex dependency structures, and certain objects can be on the critical path for web page loading. These critical-path objects need not necessarily be small in size. Small objects might have low priorities such that accelerating them will not improve load time and thus would waste LLC bandwidth. In contrast, high-priority objects can be large such that sending those to LLC will be slower than HBC. Application-level input could help distinguish between these cases.

But object-level splitting has two drawbacks. First, we want to avoid requiring application input, which creates deployment hurdles and extra work for developers. Second, it misses opportunities for latency improvement. A web object that’s not small enough to be sent over LLC will still involve small and latency-sensitive DNS lookups, TCP connection establishment, TLS handshaking, and ACKs. Accelerating this traffic could significantly reduce object delivery time. We later demonstrate (§5.3) that object-level splitting is less effective than finer-grained packet-level steering.

Steering packets at the network layer (e.g., IP datagrams) comes with its own challenges, however. First, we do not have any application-level insight into the flow: we do not necessarily know how packet-level acceleration affects applicationlevel acceleration, so we will need a careful steering heuristic. Second, even if we identify the packets to accelerate, sending packets within a flow across two different channels might result in the packets arriving out-of-order, confusing TCP. To address this issue, we will introduce a small reordering buffer (ROB) at the endpoints. The following subsections discuss these components of the design.

分流粒度 (Steering Granularity):

DChannel 选择在网络层（IP 数据包级）而非应用层进行分流

优势：无需应用层修改或输入，便于部署
效果：能够加速 DNS 查询、TCP 握手、TLS 握手和 ACK 等细微但延迟敏感的流量，这比单纯的对象级分流更有效

3.3 Packet Steering Intuition¶

Define a “message” as a sequence of one or more packets such that the receiving endpoint can take some useful action after receiving the full message. For example, an individual SYN or ACK is a message (because the transport layer can act on it), and an HTTP request or a full response spread across multiple packets is a message (because the application may be able to process the request, display an object to the user, etc.). In contrast, an individual data packet belonging to a large HTTP request/response is not a message on its own and would not be worth accelerating individually since we need to accelerate the whole sequence of packets to finish the message.

Ideally, we would like to accelerate the delivery of messages, especially those that are most valuable to accelerate, within the bandwidth constraints of the LLC. This suggests a cost-rewards calculation weighing the benefit of accelerating a message against the cost of utilizing the meager bandwidth of the LLC which might be better spent on other messages.

A direct, exact cost-rewards calculation is infeasible since DChannel running at the network layer lacks full knowledge of message boundaries (in the application’s data stream), as well as the relative value of messages to the receiver’s transport layer or application. This leads us to begin with a permissive assumption: any packet might be a message boundary and we will optimistically consider accelerating it. Nevertheless, even operating transparently at the network layer, DChannel does have certain information about rewards and costs that will help it distinguish among packets.

First, the benefit of steering a packet to the LLC depends on how much its arrival time would improve, if at all, compared to using the HBC. This depends on packet size, current output queue lengths for both channels (which are locally observable), and latency of both channels (which can be estimated). In addition, the vast majority of applications utilize TCP or other transport that delivers messages in order. This means that for a message inside packet \(P_i\), delivery of the message to the application (as opposed to the delivery of \(P_i\) to the receiving host) will depend not only on the arrival time of \(P_i\), but also on the arrival time of packets \(P_0, \ldots, P_{i-1}\) (which can also be estimated). For example, suppose \(P_{i-1}\) was sent over the HBC, and \(P_i\) is ready to send immediately after. If \(P_i\) is also sent over HBC, the pair will arrive at about the same time. If \(P_i\) is sent over LLC, it will very likely arrive much sooner, but will end up waiting for \(P_{i-1}\) before it can be delivered to the application, meaning sending over the LLC is likely not useful in this case.

Second, the cost of utilizing LLC resources will depend on the packet length and how much the LLC will be in demand for other messages in the near future. The latter is not perfectly known, but current or recent outgoing LLC queue depths provide some signal.

The net effect of the above considerations is that packets should tend to get steered to the LLC when they are smaller, and when they are more isolated in time as individual packets or members of short packet sequences. This corresponds well with the intuition of prioritizing acceleration of control messages or small application-level messages. We now proceed to describe how we realize this cost-rewards approach.

成本-收益导向算法 (Cost-Rewards Steering):

鉴于 LLC 带宽极其有限，DChannel将其视为昂贵资源，通过计算每个数据包的“收益”与“成本”来决定是否使用 LLC

收益 (Rewards): 通过 LLC 发送相比 HBC 所减少的数据包完成时间（需考虑前序包的到达时间以保证顺序）
成本 (Cost): 占用 LLC 带宽对后续数据包造成的潜在排队延迟
决策: 当收益大于成本乘以系数 \(\alpha\)（实验得出 \(\alpha=0.75\) 效果最佳）时，将数据包分流至 LLC

3.4 Rewards and Cost¶

Problem statement. The packet steering algorithm is presented with a sequence of packets and needs to decide if each packet \(P_n\) should be sent via LLC or HBC. We let \(P_1, \ldots, P_n\) denote the sequence of packets in a single end-to-end flow (by which we mean a unidirectional transport layer connection, which may contain multiple messages).

Rewards. At the packet level, the objective is to minimize the packet completion time \(C_n\), defined as the time by which all packets \(P_0, \ldots, P_n\) would arrive at the receiver. This captures the intuition (§3.3) that any \(P_n\) might be a useful message to accelerate on its own, but it wouldn’t be delivered to the application until prior packets are also delivered. The benefit of sending a packet \(P_n\) via LLC is thus the reduction of \(C_n\) if \(P_n\) is sent via LLC (denoted \(C_{n,\mathrm{LLC}}\)), compared to when it is sent via HBC (denoted \(C_{n,\mathrm{HBC}}\)). Thus, we calculate the rewards for sending \(P_n\) via LLC as:

\[ R(P_n) = C_{n,\mathrm{LLC}} - C_{n,\mathrm{HBC}}. \]

To calculate the above, we first need to estimate the delivery time \(D\) for a packet that depends on the channel/link propagation delay \(D_{\mathrm{proplink}}\) and bandwidth \(B_{\mathrm{link}}\), packet size, and the link’s queue size \(Q_{\mathrm{link}}\) at time \(t_n\). The \(Q_{\mathrm{link}}\) counts the number of bytes that have been enqueued for transmission through a link but have not yet been transmitted out the interface. Delivery time for \(P_n\) on a certain link is thus:

\[ D_{\mathrm{link}}(P_n) = D_{\mathrm{proplink}} + \frac{\mathrm{size}(P_n) + Q_{\mathrm{link}}(t_n)}{B_{\mathrm{link}}}. \tag{1} \]

The packet completion time for \(P_n\) (\(C_n\)) should also account for completion times of \(P_0\) through \(P_{n-1}\) (i.e., \(C_{n-1}\)) since \(P_n\) may arrive at the receiver before \(P_{n-1}\), especially if \(P_n\) is sent over LLC and \(P_{n-1}\) was sent over HBC. Thus, we can calculate:

\[ C_{n,\mathrm{link}} = \max!\big(C_{n-1},; t_n + D_{\mathrm{link}}(P_n)\big). \tag{2} \]

Note that \(D_{\mathrm{proplink}}\) are nondeterministic, comprising dynamic channel delay and any congestion along the channel’s path, and will thus have to be estimated. We return to this later.

Cost. The cost of sending a packet to the LLC comes from the increased utilization of LLC. Intuitively, the cost should increase with the added queueing delay that a packet arriving very soon after \(P_n\) would experience, i.e., \(\mathrm{size}(P_n)/B_{\mathrm{LLC}}\). The cost should also be higher if the LLC is currently more highly utilized so that its limited capacity is reserved for higher-reward packets. We use a heuristic that captures these two effects; specifically, we compute the cost (or fare \(F\)) of putting \(P_n\) on LLC as:

\[ F(P_n) = \frac{\mathrm{size}(P_n) + Q_{\mathrm{LLC}}(t_n)}{B_{\mathrm{LLC}}}. \tag{3} \]

Note that to be more precise, we should compute the difference in costs of putting the packet on LLC vs. HBC. But as the HBC bandwidth is dramatically higher, its cost is negligible and we omit it for simplicity.

Comparing rewards and cost. At a high level, we want to steer packets to LLC when the rewards outweigh the costs, but comparing them involves a tradeoff: the benefit is immediate to packet \(P_n\), whereas the cost affects possible subsequent packets which may not appear. We introduce a parameter \(\alpha\) to capture this, so that we will send a packet to LLC when:

\[ R(P_n) > \alpha F(P_n). \]

Calibrating \(\alpha\). If we set \(\alpha\) too low, a flow may aggressively send packets to LLC so that it will deny resources to another flow in a multi-flow application. If we set it too high, we can be too conservative in utilizing the fast LLC. To find a good \(\alpha\) and determine how sensitive performance is to its value, we conduct experiments with web browsing across different alpha values. We load 40 web pages from our corpus over different \(\alpha\) values and pick \(\alpha\) with the best Page Load Time (PLT) result on average. We use our testbed (§5.1) and apply the packet steering over HBC and LLC. For LLC, we use 5G NR URLLC as a reference where the RTT and bandwidth is 5 ms and 2 Mbps. For HBC, we vary its RTT while fixing bandwidth at 200 Mbps.

The detailed results are in §A.2. In summary, the results confirm that setting \(\alpha\) too low or high has suboptimal performance. The best value for HBC RTT of 20 ms to 60 ms is 0.75. This RTT range covers most cases of 5G eMBB. As the RTT increases to 80 ms and higher, \(\alpha = 1\) is slightly better. The difference, however, is less than 1%. We use \(\alpha = 0.75\) for all subsequent experiments.

Note on design. The steering approach described here is not an optimal choice derived from a model – it is a heuristic, particularly the calculation of cost and calibration of \(\alpha\), in part since some of the relevant information (like the application-level importance of a particular packet) is unavailable. However: (1) we find the heuristic does perform well in realistic environments, (2) even if poor decisions do occur, they lead only to suboptimal performance, rather than a correctness problem, and (3) performance is not very sensitive to the exact value of \(\alpha\). In particular, even with \(\alpha = 0\) – which corresponds to the greedy strategy, where each packet uses LLC whenever it expects a reward for itself – there is still a very good PLT improvement, within 5% or less of the best \(\alpha\). That said, this problem could be interesting to formalize in the future, perhaps as an online algorithm that could provide worst-case guarantees, or using queueing-theoretic tools.

3.5 The Packet Steering Algorithm¶

Putting together the above pieces, the complete steering algorithm is shown in Algorithm 1 in Appendix A.1. To make decisions, the algorithm requires (1) packet size, (2) current LLC queue size, (3) LLC bandwidth, and (4) latency of both LLC and HBC. The LLC bandwidth is controlled (assigned by the operator), so it is known, and (1) is directly observable. LLC queue size (2) may directly be observable at the client, assuming its NIC is limited to the LLC bandwidth. But the proxy may have a higher local NIC rate. The proxy, therefore, tracks outgoing traffic per user and computes what the queue depth would be if the NIC had been limited. Depending on the service provider’s admission control policy, the rate could alternately be explicitly limited at the proxy. Client can also apply similar approach if (2) is not directly observable.

Latency (4) has to be estimated. To do this, we perform periodic handshakes (e.g., in every 500 ms in our use case). The handshakes consist of four steps, all with UDP packets: (1) the client agent sends a special packet we call a “D-SYN” to the proxy agent using both HBC and LLC. (2) The proxy agent upon seeing a D-SYN responds with “D-SYN/ACK” packets sent across both HBC and LLC. (3) The client agent receives the D-SYN/ACK packets, updates the base RTT value for both channels based on the difference between D-SYN/ACK receive time and D-SYN release time, and replies with “DACK” packets sent across both channels. (4) The proxy agent receives the D-ACK packets and updates the base RTT value for both channels. We use the minimum RTT value for the measurement. As we will see in the evaluation (§5), very rough latency estimates are sufficient.

The algorithm requires maintaining per-flow state, specifically to store C n−1 , the estimated completion time of the most recent previous packet. The proxy also stores per-user state for its queue depth calculation.

算法实现细节 (Algorithm Implementation):

参数获取：算法依赖于数据包大小、LLC 队列深度、LLC 带宽以及两条信道的延迟估算
状态维护：代理端会估算每个用户的队列深度，并通过周期性的 UDP 握手包（D-SYN/ACK）来实时估算链路的 RTT

3.6 Reordering buffers at the endpoints¶

Splitting packets across asymmetric paths (particularly with a latency differential, as there is for LLC vs HBC) can cause outof-order packet delivery, which can be harmful to application performance. In particular, TCP uses out-of-order packets as a signal of congestion, potentially causing retransmissions and a drop in sending rate. To solve this problem, we adopt a reordering buffer (ROB) in the receiving direction of each of our agents, to buffer packets arriving only from LLC. Note that we only buffer packets arrived from LLC as we only want to handle packet reordering caused by sending packets through the faster LLC and not to solve reordering caused by external factors such as wireless losses.

To avoid unbounded buffering delay if the previous packet was lost, the ROB also releases packets after a timeout. Ideally, the timeout should equal the latency of HBC, but because the latency of HBC can be variable and hard to track, we use a conservative 100 ms timeout. We evaluate the effectiveness of this timeout value under random packet loss in §5.

重排序缓冲区 (Reordering Buffer):

由于 LLC 的延迟远低于 HBC，会导致数据包到达接收端时出现乱序，从而误导 TCP 以为发生拥塞

DChannel 在接收端引入了重排序缓冲区（Reordering Buffer, ROB），专门缓存来自 LLC 的数据包进行重排，并设置了 100ms 的保守超时时间以防止因丢包导致的过度等待

Note

注意, ROB 只缓存来自 LLC 的数据包, 不缓存来自其他的数据包, 比如由其他因素(无线链路受干扰, etc.)导致的乱序

Prototype and Experimental Setup¶

Our experiments involve a client representing a mobile end-user application (e.g., a web browser) fetching content from a web or content server. Both the client and server endpoints have access to two interfaces, one representing the highbandwidth channel (HBC) and the other the low-latency channel (LLC). In the case of 5G, HBC and LLC map to eMBB and URLLC, respectively. Depending on the experiment conditions, the interfaces may be real or emulated. We masked the two interfaces at the endpoints, however, using a smart DChannel virtual interface implemented on top of a TUN device; the client and server use only this virtual interface to send and receive data. Our DChannel prototype then performs endpoint-transparent (and application-agnostic) steering of traffic.

We developed a DChannel prototype and packaged it as a UNIX shell, similar to the shells in Mahimahi [36]. The shell captures all outgoing traffic from any unmodified application running within it and tunnels them to our DChannel implementation; it processes incoming traffic in a similar application-transparent manner, so both the steering and buffering modules of DChannel are used. Our DChannel prototype attaches additional metadata (sequence number and flow ID) prior to transmission to assist the receiver in reordering packets and strips this before delivering to the application. We used our own metadata header as a convenience, but in a real implementation, this could be avoided by looking inside the layer 4 header.

We evaluated the performance of DChannel using this prototype under two settings. The first is a live setting where we used the actual 5G NR eMBB channel as HBC. The second setting, in contrast, is one where we emulated the eMBB channel based on traces that we gathered from an actual 5G eMBB channel. In both settings, since URLLC is not yet commercially available, we emulated its “expected” behavior (based on the 5G specification [6]) using a low-latency, bandwidth-limited wired Ethernet connection.

我们的实验涉及一个代表移动终端用户应用（例如Web浏览器）的客户端，该客户端从Web或内容服务器获取内容。客户端和服务器端点均可访问两个接口，分别代表高带宽信道（HBC）和低延迟信道（LLC）。在5G场景下，HBC和LLC分别映射为eMBB和URLLC 。

根据实验条件，这些接口可以是真实的，也可以是仿真的。然而，我们在端点处屏蔽了这两个接口，利用在TUN设备之上实现的智能DChannel虚拟接口；客户端和服务器仅使用此虚拟接口来发送和接收数据。我们的DChannel原型随后执行对端点透明（且与应用无关）的流量导向。

我们开发了一个DChannel原型，并将其封装为UNIX shell，类似于Mahimahi中的shell 。该shell捕获其中运行的任何未修改应用的所有出站流量，并将其隧道传输至我们的DChannel实现；它以类似的对应用透明的方式处理入站流量，从而同时使用了DChannel的导向和缓冲模块。我们的DChannel原型在传输前附加额外的元数据（序列号和流ID），以协助接收端对数据包进行重排序，并在交付给应用前剥离这些元数据。我们使用自定义元数据头是为了方便，但在实际实现中，可以通过查看第4层头部来避免这种情况。

我们在两种设置下使用该原型评估了DChannel的性能:

实网实验. 我们使用真实的5G NR eMBB信道作为HBC
基于trace. 基于我们从实际5G eMBB信道收集的轨迹（traces）来仿真eMBB信道

在这两种设置中，由于URLLC尚未商用，我们使用低延迟、带宽受限的有线以太网连接，仿真其“预期”行为（基于5G规范）

4.1 Live-eMBB Setting¶

In this setting, DChannel steers traffic over two real interfaces (Fig. 2): One interface is tethered with a 5G phone for providing access to a live eMBB channel, while another is connected to a low-latency bandwidth-limited Ethernet connection for emulating the URLLC channel. Packets transmitted over the 5G eMBB channel traverse the core network of the mobile provider before exiting via the packet gateway (i.e., mobile path) and then one or more ASes in the public Internet (i.e., Internet path) to reach our server. Data sent over the Ethernet interface, in contrast, traverse a traditional ISP and then one or more ASes to reach the server. On the server side, DChannel receives all the packets from both the interfaces, reorders them (if required), and then delivers them to the server-side application via the TUN device.

We used Ethernet and not WiFi for emulating URLLC, since the channel is expected to provide high reliability ( ≥ 0.9999) [8]. We capped the bandwidth of this link using netem to emulate the low bandwidth of URLLC. Since the client must remain physically plugged in to a wired network for emulating URLLC, this setting allows us to study performance only in stationary conditions.

在此设置中，DChannel在两个真实接口上导向流量（图2）:

一个接口通过USB网络共享（tethering）连接至5G手机以提供对现网eMBB信道的访问
另一个接口连接至低延迟带宽受限的以太网连接以仿真URLLC信道

alt text

通过5G eMBB信道传输的数据包先穿越移动提供商的核心网络，然后经由分组网关（即移动路径）出口，再经过公共互联网中的一个或多个自治系统（AS）（即互联网路径）到达我们的服务器

相比之下，通过以太网接口发送的数据穿越传统ISP，随后经过一个或多个AS到达服务器

在服务器端，DChannel接收来自两个接口的所有数据包，进行重排序（如有需要），然后通过TUN设备将其交付给服务器端应用

我们使用以太网而非WiFi来仿真URLLC，因为该信道预期提供高可靠性。我们使用netem限制该链路的带宽以仿真URLLC的低带宽。由于客户端必须物理连接至有线网络以仿真URLLC，该设置仅允许我们研究静止条件下的性能。

4.2 Emulated-eMBB Setting¶

To evaluate DChannel under a wide variety of scenarios, specifically those including client mobility, we used trace-driven emulations. Below, we describe how we captured the network (latency and bandwidth) traces of the 5G eMBB channel under stationary and low-to-moderate mobility scenarios and used them in our emulations.

4.2.1 Collecting network traces

To capture the temporal variability of mobile networks, we measured both the latency and throughput of the eMBB channel over time.

Latency traces. We measured the latency of the eMBB channel by periodically sending probes (UDP packets) from the client to the server. We set the probing period to 15ms to force the UE radio to remain always in “active” mode and generate only a small amount of probe traffic to avoid queuing. Our measurements capture the latency imposed by the base station and core network, since our server was always in close proximity to the client (i.e., less than 150 miles), minimizing the Internet-path latency. Our traceoutes from the client to the server, although not shown in the paper, also confirmed that the latency between the client and the server was very close to the latency between the client and the packet gateway.

Bandwidth traces. We measured the throughput across time of both uplink and downlink channels by saturating them with MTU-sized UDP packets. Since TCP cannot reliably saturate the highly variable cellular uplink and downlink concurrently, we used an overestimated fixed sending rate to always fill the queue. First, we measured the maximum supported upload and download UDP throughput using existing tools such as iperf. Then, we sent traffic at this maximum rate from both endpoints. Finally, we used the actual packets received over time by the endpoints to estimate the uplink and downlink capacities.

Measuring both latency and bandwidth. A key challenge in measuring both latency and bandwidth simultaneously is avoiding interference: bandwidth-intensive operations can saturate the link and fill the queue, thereby inflating the latency. Since cellular networks use per-user queues, we addressed this challenge by measuring latency and bandwidth from separate devices. When using two separate devices, we did not see any perceivable interference for measurements on 5G low-band, although we observed them on 5G mmWave. Specifically, we observed inflation in latency if a nearby device was uploading data at more than 5 Mbps using mmWave. 4 For 5G mmWave, we measured, hence, only the downlink throughput over time; we set the uplink bandwidth to a single, fixed rate of 60 Mbps.

The accuracy of temporal variations in latency matters most for our trace-driven emulations, since the main applications that we use in our evaluations, web browsing and mobile apps, are latency-sensitive. The performance of such applications crucially depends on TCP-related configurations (e.g., initial congestion window) and network latency (or RTT) rather than on available bandwidth, particularly when the bandwidth is more than 16 Mbps [45]. Our approach to estimating bandwidths, therefore, is adequate for our evaluations.

为了捕获移动网络的时间变异性, 我们测量了eMBB信道随时间变化的延迟和吞吐量.

延迟轨迹. 我们通过定期从客户端向服务器发送探测包 (UDP数据包) 来测量eMBB信道的延迟. 我们将探测周期设置为15ms, 以强制UE无线电始终保持"活动 (active)"模式, 并仅生成少量探测流量以避免排队. 我们的测量捕获了基站和核心网络施加的延迟, 因为我们的服务器始终与客户端距离很近 (即小于150英里), 从而最大限度地减少了互联网路径延迟. 我们从客户端到服务器的路由跟踪 (traceroutes) 也证实, 客户端与服务器之间的延迟非常接近客户端与分组网关之间的延迟.

带宽轨迹. 我们通过使用MTU大小的UDP数据包使上行和下行信道饱和, 测量了随时间变化的吞吐量. 由于TCP无法可靠地同时使高度可变的蜂窝上行和下行链路饱和, 我们使用了一个高估的固定发送速率以始终填满队列. 首先, 我们使用iperf等现有工具测量了最大支持的上传和下载UDP吞吐量. 然后, 我们从两个端点以该最大速率发送流量. 最后, 我们利用端点随时间接收到的实际数据包来估算上行和下行容量.

同时测量延迟和带宽. 同时测量延迟和带宽的一个关键挑战是避免干扰: 带宽密集型操作可能会使链路饱和并填满队列, 从而导致延迟膨胀. 由于蜂窝网络使用每用户队列 (per-user queues), 我们通过使用独立设备分别测量延迟和带宽来解决这一挑战. 当使用两个独立设备时, 我们在5G低频段测量中未发现任何可感知的干扰, 尽管在5G毫米波上观察到了干扰. 具体而言, 如果附近的设备使用毫米波上传数据的速率超过5 Mbps, 我们观察到延迟会发生膨胀. 因此, 对于5G毫米波, 我们仅测量了随时间变化的下行吞吐量; 我们将上行带宽设置为单一的固定速率60 Mbps.

延迟的时间变化的准确性对我们的轨迹驱动仿真最为重要, 因为我们在评估中使用的主要应用 (网页浏览和移动应用) 都是延迟敏感型的. 此类应用的性能关键取决于TCP相关配置 (如初始拥塞窗口) 和网络延迟 (或RTT), 而非可用带宽, 特别是当带宽超过16 Mbps时. 因此, 我们的带宽估算方法对于我们的评估而言是充分的.

4.2.2 Emulating the traces

In the emulated-eMBB setting, we run both the client and the server on the same machine. DChannel then steers traffic between them over two virtual interfaces, emulated using an extended version of Mahimahi [36]. Specifically, we extended Mahimahi’s delay shell to vary the eMBB channel latency over time, based on a trace generated from a real 5G deployment. The modified delay shell accepts a trace comprising a “timeline” of RTT values and halves each value to derive the individual uplink and downlink latency timelines. The shell then assigns per-packet latency by choosing an uplink or downlink latency by matching the time a packet arrives at the interface against the timelines. Since the trace-file granularity is one RTT sample per 15 ms, we use linear interpolation for assigning RTTs arriving between two samples. Similarly, we emulated URLLC with a propagation delay of 5 ms and bandwidth of 2Mbps, unless noted otherwise.

Mobile applications’ traffic (especially web browsing) is typically bursty in nature and contains periods of inactivity. To preserve energy during idle periods, UEs switch to a low-power (or “sleep”) state, which supports discontinuous reception (DRX). The transition to the low-power state depends on an inactivity timer that we observed (through probing [35]) to be around 30 ms for 5G mmWave; once the device enters this state, it will “wake up” periodically (every 40 ms). When emulating the latency traces, we therefore also estimate the radio power states of the device (based on its activity) and take into account any additional latency the state transitions may impose. A packet that arrives 20 ms after the UE enters the sleep state, for instance, will experience an additional 20 ms delay before it is processed. This delay, however, is not incurred on the uplink. For 5G low-band, we set the inactivity timer to 100 ms and wake-up interval to 20 ms.

For the bandwidth emulation, we extended Mahimahi’s link shell to emulate a time-varying bandwidth that changes every second. To emulate a link of capacity 60 Mbps at time n seconds, for instance, this extended link shell will release 7.5 KB per millisecond. In our emulation tests, we also used a FIFO (drop-tail) queue, and we set the buffer to 800 MTUsized packets.

在仿真eMBB设置中, 我们在同一台机器上运行客户端和服务器. DChannel随后在两个通过Mahimahi扩展版本仿真的虚拟接口之间导向流量. 具体而言, 我们扩展了Mahimahi的delay shell, 以便根据从真实5G部署生成的轨迹随时间改变eMBB信道延迟. 修改后的delay shell接受包含RTT值"时间轴"的轨迹, 并将每个值减半以推导出独立的上行和下行延迟时间轴. 随后, 该shell通过将数据包到达接口的时间与时间轴进行匹配, 选择上行或下行延迟来分配逐包延迟. 由于轨迹文件粒度为每15ms一个RTT样本, 我们使用线性插值法分配到达两个样本之间的RTT. 同样地, 除非另有说明, 我们仿真了传播延迟为5ms, 带宽为2Mbps的URLLC.

移动应用流量 (尤其是网页浏览) 通常具有突发性, 并包含非活动期. 为了在空闲期间节省能源, UE会切换到支持非连续接收 (DRX) 的低功耗 (或"睡眠") 状态. 向低功耗状态的转换取决于非活动计时器, 我们 (通过探测) 观察到5G毫米波的该计时器约为30ms; 一旦设备进入此状态, 它将周期性地 (每40ms) "唤醒". 因此, 在仿真延迟轨迹时, 我们也估算设备的无线电功率状态 (基于其活动), 并计入状态转换可能带来的任何额外延迟. 例如, 在UE进入睡眠状态20ms后到达的数据包将在处理前经历额外的20ms延迟. 然而, 上行链路不产生此延迟. 对于5G低频段, 我们将非活动计时器设置为100ms, 唤醒间隔设置为20ms.

对于带宽仿真, 我们扩展了Mahimahi的link shell以仿真每秒变化的时变带宽. 例如, 为了在时间n秒仿真60 Mbps容量的链路, 该扩展link shell将每毫秒释放7.5 KB. 在我们的仿真测试中, 我们还使用了FIFO (去尾) 队列, 并将缓冲区设置为800个MTU大小的数据包.

Discussions and Future Work¶

Deployment for 5G networks: DChannel requires cellular operator support, to allow URLLC for non-critical traffic and to perform stateful packet steering. However, operators may omit some of the DChannel implementations to make deployment easier, such as eliminating the reordering buffer (ROB) on the core gateway since DChannel shows only minor performance degradation without ROB in the proxy-side (§5.5). DChannel stateful packet steering may not be simple to implement especially when there are multiple gateways. We leave this to future work.

URLLC scalability: The number of users that can send general traffic to URLLC is an important matter which deserves to be evaluated quantitatively in the future. At the time of writing, URLLC is not yet deployed in public. However, based on the white paper [7], URLLC is targeted to support a relatively high connection density with modest per-user bandwidth. For instance, one of the URLLC use cases (discrete automation) requires a user-experienced data rate of 10 Mbps, traffic density of 1 Tbps/km 2 , connection density of 100,000/km 2 , and max end-to-end latency of 10ms. Thus, the 2 Mbps maximum bandwidth per user for general application traffic used in our experiments is still reasonable based on others’ proposed use cases for URLLC, even in a dense urban area.

Disrupting URLLC native traffic: URLLC is primarily built to serve latency-sensitive critical applications. To ensure we do not compromise the performance of these applications, the network operator can limit the per-user bandwidth and even choose to deprioritize non-critical packets as our approach does not require 99.999% reliability and is resilient to small increases in URLLC latency (§A.3.2).

Resource contention among applications: Multiple applications inside a user device may compete to use URLLC. We can regulate them using prioritization. One simple approach is to prioritize applications running in the foreground since mobile phone users are typically single-tasking.

Incentives for operators: While URLLC targets critical applications, it is up to the network providers to open URLLC for general mobile applications like web browsing. This is possible as 5G chipsets are typically designed to support multiple bands including the sub-6GHz bands for URLLC [6]. Expanding URLLC applications can encourage providers to foster a faster and broader deployment of URLLC as it brings a smoother experience to their major customers – mobile phone users; especially as the current market for URLLC applications like self-driving cars and remote surgery is still in its infancy.

Emulation uncertainty: The real URLLC performance might not match our emulated URLLC that follows the 5G NR white paper. However, we have performed several experiments to show the robustness of DChannel under variable URLLC conditions. Emulating the real behavior of a cellular network (eMBB) is also a known hard problem [51], and our approach of using two phones to capture both eMBB latency and bandwidth might not be perfect. We have compared DChannel performance with the emulated eMBB and live eMBB in stationary conditions and conclude that DChannel offers the same performance benefit (§5.4). However, we have not yet evaluated DChannel under non-stationary live eMBB due to the environment limitation (§4.1).

Other applications: LLC and HBC combination can also properly support applications from different domains that require high bandwidth and low latency, something that cannot be satisfied by utilizing a single channel. For instance, cloud gaming, which allows users to play games from remote servers, requires high bandwidth to stream content and low latency to remain responsive to user input. Since these applications can be vastly different than web browsing, a superior steering scheme may exist. We plan to analyze them further to determine an effective way of leveraging LLC and HBC. Beyond mobile networks: Our insights may apply to other LLC and HBC combinations with analogous bandwidth and latency trade-offs. Examples include quality of service (QoS) differentiation providing separate latency- and bandwidthoptimized services [17, 39]; and routing traffic among multiple ISPs where one is more expensive but provides better latency, as may happen with very low Earth orbit satellitebased [24] or specialty [18] ISPs. To achieve the optimum cost-to-performance ratio, we can route only the latency-sensitive traffic to the low-latency ISP.

Future wireless design: The 5G URLLC is only equipped with limited user bandwidth, and hence it is not suitable to serve general application traffic. The bandwidth is severely compromised because it needs to provide both low latency and very high reliability (99.999%). However, general applications do not need the almost-perfect reliability that URLLC guarantees. Future wireless networks (such as 6G) may reconsider this trade-off and provide a low-latency channel with somewhat greater bandwidth and somewhat lower reliability.

讨论与未来工作:

针对 5G 网络的部署: DChannel 需要蜂窝运营商的支持, 以允许将 URLLC 用于非关键业务流量, 并执行有状态的数据包导向 (stateful packet steering). 然而, 运营商可能会为了简化部署而省略部分 DChannel 的实现, 例如在核心网网关处消除重排序缓冲区 (ROB), 因为实验表明在代理侧没有 ROB 的情况下, DChannel 的性能降级微乎其微. DChannel 的有状态数据包导向可能难以实现, 特别是在存在多个网关的环境中. 我们将此留待未来工作研究.

URLLC 的可扩展性: 能够通过 URLLC 发送通用流量的用户数量是一个重要问题, 值得在未来进行定量评估. 截至本文撰写之时, URLLC 尚未公开部署. 然而, 基于白皮书, URLLC 旨在支持相对较高的连接密度以及适度的单用户带宽. 例如, URLLC 的一个用例 (离散自动化) 要求 10 Mbps 的用户体验数据速率, 1 Tbps/km² 的流量密度, 100,000/km² 的连接密度以及 10ms 的最大端到端延迟. 因此, 基于其他人提出的 URLLC 用例, 我们在实验中用于通用应用流量的每用户 2 Mbps 最大带宽即便在人口稠密的城市区域依然是合理的.

干扰 URLLC 原生流量: URLLC 主要旨在服务于延迟敏感的关键型应用. 为确保不损害这些应用的性能, 网络运营商可以限制每用户带宽, 甚至选择降低非关键数据包的优先级, 因为我们的方法不需要 99.999% 的可靠性, 并且对 URLLC 延迟的小幅增加具有鲁棒性.

应用间的资源争用: 用户设备内的多个应用可能会竞相使用 URLLC. 我们可以利用优先级机制对其进行调节. 一种简单的方法是优先考虑在前台运行的应用, 因为手机用户通常习惯于单任务操作.

运营商的激励机制: 虽然 URLLC 针对的是关键型应用, 但决定是否向网页浏览等通用移动应用开放 URLLC 取决于网络提供商. 这在技术上是可行的, 因为 5G 芯片组通常设计为支持多频段, 包括用于 URLLC 的 Sub-6GHz 频段. 扩展 URLLC 的应用范围可以鼓励提供商推动 URLLC 更快, 更广泛的部署, 因为它能为其主要客户--手机用户--带来更流畅的体验; 特别是考虑到当前自动驾驶和远程手术等 URLLC 应用市场尚处于起步阶段.

仿真不确定性: 真实的 URLLC 性能可能与我们遵循 5G NR 白皮书构建的仿真 URLLC 不完全匹配. 然而, 我们已进行了多项实验, 展示了 DChannel 在可变的 URLLC 条件下的鲁棒性. 仿真蜂窝网络 (eMBB) 的真实行为也是一个公认的难题, 我们使用两部手机分别捕获 eMBB 延迟和带宽的方法可能并不完美. 我们在静止条件下比较了 DChannel 在仿真 eMBB 和现网 eMBB 中的性能, 结论表明 DChannel 提供了相同的性能收益. 然而, 由于环境限制, 我们尚未在非静止状态下的现网 eMBB 中评估 DChannel.

其他应用: LLC 和 HBC 的组合也能很好地支持其他领域中同时要求高带宽和低延迟的应用, 这是单一信道无法满足的. 例如, 云游戏允许用户从远程服务器玩游戏, 它需要高带宽来流式传输内容, 同时需要低延迟以保持对用户输入的快速响应. 由于这些应用可能与网页浏览存在巨大差异, 因此可能存在更优的分流方案. 我们计划对其进行进一步分析, 以确定利用 LLC 和 HBC 的有效途径.

移动网络之外: 我们的见解可能适用于具有类似带宽和延迟权衡的其他 LLC 和 HBC 组合. 例如包括提供区分延迟和带宽优化服务的服务质量 (QoS) 差异化; 以及在多个 ISP 之间路由流量, 其中某个 ISP 价格较高但延迟更低, 这种情况可能出现在极低轨道卫星或专用 ISP 中. 为了实现最佳的性价比, 我们可以仅将延迟敏感型流量路由至低延迟 ISP.

未来的无线设计: 5G URLLC 仅配备了有限的用户带宽, 因此不适合承载通用应用流量. 其带宽受到严重牺牲, 因为它需要同时提供低延迟和极高的可靠性 (99.999%). 然而, 通用应用并不需要 URLLC 所保证的近乎完美的可靠性. 未来的无线网络 (如 6G) 可能会重新审视这一权衡, 提供一个带宽稍大但可靠性稍低的低延迟信道.

There have been multiple works that try to exploit the multi-access connectivity on the client.

Application layer multipath: Socket Intents [42] and Intentional networking [28] both expose custom APIs to applications and offer OS-level support for managing multiple interfaces. Both of them regulate application traffic based on application-specific information. Our work, in contrast, does not require application inputs or modifications, although in the future we might consider giving input to the steerer to support more specific applications.

Transport layer multipath: There are already numerous efforts to design multipath transport protocols such as RMTP [33], pTCP [30], mTCP [52], SCTP multihoming [31], and MPTCP [49]. These protocols deliver application traffic through multiple paths to achieve better throughput and reliability. Due to the bandwidth aggregation focus, multipath transport protocols give notable benefits to long-flow dominated applications but not to short-flow dominated applications such as web browsing [20]. Our approach works transparently with single-path transport protocols (e.g., TCP and UDP).

Network layer multipath: Tsao and Sivakumar [46] proposed a super aggregation concept where TCP can achieve better WiFi throughput by selectively steering packets to 3G. ASAP [29] steers network packets over satellite ISP and lowerlatency terrestrial networks to improve HTTPS. We compared DChannel against ASAP in our evaluation and found that DChannel is better for eMBB and URLLC pairs as it benefits from finer-grained decisions.

An early version of DChannel was presented in [43]. This work comes with a new and better-performing packet steering algorithm, a more robust evaluation with real-world traces and live 5G eMBB, and new use cases including mobile apps and bulk transfer.

已有多种工作尝试利用客户端的多接入连接能力, 主要可分为应用层, 传输层和网络层三个维度的研究.

应用层多路径 (Application Layer Multipath)
- 代表工作: Socket Intents [42] 和 Intentional networking [28]
- 机制: 向应用程序暴露自定义 API, 并提供操作系统级别的支持来管理多个接口. 它们基于应用特定的信息来调节应用流量
- 与 DChannel 对比: DChannel 不需要任何应用层的输入或修改 (尽管未来可能考虑), 相比之下部署更加简便
传输层多路径 (Transport Layer Multipath)
- 代表工作: RMTP [33], pTCP [30], mTCP [52], SCTP multihoming [31], 以及 MPTCP [49]
- 机制: 设计多路径传输协议, 通过多条路径传输应用流量, 旨在聚合带宽以提高吞吐量和可靠性
- 局限性: 由于侧重于带宽聚合, 这些协议对长流主导的应用有显著益处, 但对网页浏览等短流主导的应用收益有限
- 与 DChannel 对比: DChannel 可透明地与单路径传输协议 (如 TCP 和 UDP) 配合工作
网络层多路径 (Network Layer Multipath)
- 代表工作: Tsao 和 Sivakumar [46] (提出超级聚合概念, 通过选择性导向数据包至 3G 来提升 WiFi TCP 吞吐量); ASAP [29] (在卫星 ISP 和低延迟地面网络间导向数据包以优化 HTTPS)
- 与 DChannel 对比: 评估显示, DChannel 在 eMBB 和 URLLC 配对场景下优于 ASAP, 因为 DChannel 受益于更细粒度的决策机制
DChannel 的演进 (Evolution of DChannel)
- 前序工作: DChannel 的早期版本发表于 [43]
- 本文改进: 提出了新的且性能更优的数据包导向算法, 使用了真实世界轨迹和现网 5G eMBB 进行更鲁棒的评估, 并增加了移动 App 和大文件传输等新用例

类别 (Category)	代表性工作 (Representative Works)	主要机制与特点 (Key Mechanisms & Features)	与 DChannel 的区别/对比 (Contrast with DChannel)
应用层多路径	Socket Intents [42], Intentional networking [28]	暴露自定义 API, 提供 OS 级支持; 基于应用特定信息调节流量.	DChannel 无需应用层输入或修改, 部署更透明, 更简便.
传输层多路径	MPTCP [49], RMTP [33], pTCP [30], mTCP [52], SCTP [31]	通过多路径传输提升吞吐量和可靠性; 侧重于带宽聚合.	此类协议主要利好长流应用, 对短流 (如网页浏览) 收益有限; DChannel 可透明兼容单路径 TCP/UDP.
网络层多路径	Tsao & Sivakumar [46], ASAP [29]	在不同网络 (如 WiFi/3G, 卫星/地面) 间选择性导向数据包.	相比 ASAP, DChannel 在 eMBB/URLLC 组合下凭借更细粒度的决策表现更优.
版本演进	Early DChannel [43]	DChannel 的早期原型与研究.	本文包含新的导向算法, 基于真实 5G 轨迹和现网的鲁棒评估, 以及移动 App 等新用例.