NemFi: Record-and-Replay to Emulate WiFi¶

这篇论文介绍了 NemFi, 这是一种用于 WiFi 网络的"录制-回放"(Trace-driven)模拟器. 它的核心目标是解决现有蜂窝网络模拟器无法准确模拟 WiFi 环境的问题, 从而为网络应用提供可重复, 高保真的测试环境.

(1) 研究背景与挑战

作者指出, 虽然 Simulation 和 Testbed 各有优劣, 但 Trace-driven emulation 能很好地平衡真实性与可重复性. 然而, 现有的蜂窝网络模拟器(如 Winstein 等人开发的工具)不能直接用于 WiFi, 主要面临以下三大挑战:

捕捉传输机会 (Delivery Opportunities):
- 蜂窝网络的上行和下行链路是分离的, 而 WiFi 是基于竞争的共享介质
- 如果同时通过饱和流量测试上行和下行, 会引发竞争, 导致无法准确捕捉链路容量
识别饱和点 (Saturation Point):
- 蜂窝网络队列大且速率相对稳定, 而 WiFi 队列小且 PHY 速率(物理层速率)随信道条件动态变化
- 使用固定的发送窗口会导致缓冲区溢出或无法跑满带宽
WiFi 特性的模拟:
- 必须准确模拟 WiFi 特有的丢包行为(而非仅是缓冲区溢出导致的丢包)
- 以及帧聚合(Frame Aggregation)机制

(2) NemFi 的设计与实现

NemFi 扩展了现有的蜂窝网络模拟器, 由 录制 (Record) 和 回放 (Replay) 两个模块组成

A. 录制模块 (Record Module)

单向饱和策略: 为了避免竞争, NemFi 选择单向饱和链路来捕捉传输机会. 研究表明, 单向测得的吞吐量等于双向吞吐量之和, 且 WiFi 在上行/下行间存在公平共享
速率控制算法: NemFi 引入了一个类似 PID 控制器的算法, 每 25ms 读取一次 PHY 速率, 并动态调整发送窗口大小, 以确保既能饱和链路又不引起人为丢包

B. 回放模块 (Replay Module)

基于 Mahimahi: 回放模块构建在 Mahimahi 框架之上
共享传输机会: 使用加权轮询(Weighted Round-Robin)机制, 在模拟的回放过程中动态分配上行和下行流量的传输机会
模拟丢包与帧聚合: 根据录制的丢包率进行概率性丢包模拟, 并采用 da Hora 等人的模型来模拟帧聚合行为

(3) 实验评估 (Evaluation)

作者在三种场景下(理想, 静态远距离, 移动)对 NemFi 进行了验证, 使用了 Iperf, SCP 和 DASH(视频流)三种应用

录制准确性: NemFi 的吞吐量能紧密跟随 PHY 速率的变化, 且链路利用率接近 100%. 在理想环境下, NemFi 引入的额外丢包率极低(平均 0.2%)
回放逼真度: 实验结果显示, 通过 NemFi 回放的应用性能与真实 WiFi 环境下的性能非常接近, 差异小于 3%

Introduction¶

WiFi is increasingly more popular due to the widespread use of mobile devices (e.g. smartphones, laptops, tablets, smartwatches, etc.)[7]. The quality of WiFi connectivity varies drastically from place to place and over time due to several factors such as poor network configuration, old equipment, fluctuating demands of users, congestion, and coverage. As many of today's applications and services will be running over WiFi, it is crucial to evaluate the performance of these applications in different network conditions. The variability of WiFi makes it hard to predict how an application/service will work with just a few experiments. Testing in one/few settings tells little of how a service will behave when deployed at a large scale over a long period.

There are different options for evaluating networked applications and services before deployment: simulation, testbed experiments, and emulation. Simulation is the easiest way to experiment with different wireless network conditions. Network simulation tools (e.g. NS-2 [3], NS-3 [1], OMNET++ [9], to name a few) are used to mimic the behavior of wireless networks in a software-based environment. The advantages of simulation are repeatability, control, configurability, and scalability. The main limitation of simulation tools, however, is that they require the user to tune the different parameters, e.g. level of interference, congestion, loss rate (among others), which may not reflect real wireless network conditions. Even with good parameter settings, a simulator cannot capture the complex inter-dependencies of real systems.

At the other end of the spectrum, there is testbed experimentation, where developers evaluate their applications over deployed wireless links either over testbeds or by relying on volunteer testers. The results of such experiments capture the impact of real wireless network conditions. The major disadvantage of experimentation is that it offers no repeatability, and is difficult to scale. The variability of wireless networks makes it hard to reproduce results. The results of experimentation are, therefore, hard to interpret and one cannot distinguish the issues with application versus wireless conditions.

Finally, trace-driven emulation [6, 10] involves recording traces in deployed wireless networks and later replay these traces to reproduce the recorded network conditions. The clear benefits of trace-driven emulations are its ability to capture real network conditions and the repeatability of the experiments. One can run the same network conditions several times, which eases application or system debugging, and enables comparative analysis of different applications or protocols over the same network conditions.

While there exist trace-driven emulators for cellular [10] and HTTP traffic [6], to the best of our knowledge, there exists no such solution for WiFi. Adapting cellular emulation for WiFI is not trivial due to the many fundamental differences between WiFi and cellular, in particular, in how they manage access to the shared medium, how they react to packet loss, and other technology-specific protocols. These differences makes the existing cellular network emulator unable to accurately emulate WiFi.

Motivated by the advantages of trace-driven emulation, and the lack of such a tool for WiFi, we design and implement NemFi: 1 a trace-driven emulator for WiFi. NemFi extends the state-of-the-art cellular emulator [10] to support accurate WiFi emulation. We begin by identifying the challenges of trace-driven emulation for WiFi. These challenges help us identify the key design decisions of NemFi's record-and-replay modules. We demonstrate through extensive evaluations that NemFi accurately emulates WiFi in various network conditions.

The main contributions of this paper are therefore:

• We identify the challenges of trace-driven emulation for WiFi.

• We introduce a novel trace-driven emulator for WiFi, which makes it possible to evaluate network applications and services over emulated WiFi conditions.

The rest of this paper is organized as follows. In Section 2, we provide a brief overview of the existing trace-driven cellular emulator, followed by the list of challenges for designing a trace-driven emulator for WiFi. In Section 3, we explain the design of NemFi. In Section 4, we validate the accuracy of NemFi's design via a series of experiments for various types of applications and mobility scenarios. We conclude the manuscript in Section 5 and provide future research directions.

1. 引言

随着移动设备(如智能手机, 笔记本电脑, 平板电脑, 智能手表等)的广泛普及, WiFi 的应用日益增多. 然而, 受网络配置不佳, 设备陈旧, 用户需求波动, 网络拥塞以及覆盖范围限制等多种因素影响, WiFi 连接质量在不同地点和时间上存在剧烈波动. 鉴于当今许多应用和服务都将在 WiFi 环境下运行, 评估这些应用在不同网络条件下的性能至关重要. WiFi 的可变性使得仅通过少量实验难以预测应用或服务的运行效果. 仅在单一或少数几种设置中进行测试, 很难反映服务在大规模长期部署后的行为表现.

在部署前评估网络应用和服务主要有三种选择: simulation, testbed experiments和emulation

simulation是在不同无线网络条件下进行实验的最简便方法. 网络模拟工具(如 NS-2 [3], NS-3 [1], OMNET++ [9] 等)被用于在基于软件的环境中模仿无线网络的行为. 模拟的优势在于可重复性, 可控性, 可配置性和可扩展性.

然而, simulation工具的主要局限性在于用户需要调整各种参数(例如干扰水平, 拥塞程度, 丢包率等), 而这些参数可能无法真实反映无线网络的实际情况. 即使参数设置得当, simulator也无法捕捉真实系统中复杂的相互依赖关系.

在各方法的另一端是testbed实验. 开发人员通过部署的无线链路(利用测试床或依赖志愿者测试者)来评估其应用. 此类实验的结果能够捕捉真实无线网络条件的影响.

但实验的主要缺点在于无法提供可重复性, 且难以扩展. 无线网络的可变性使得结果难以复现. 因此, 实验结果往往难以解读, 研究人员无法区分问题是源于应用程序本身还是无线网络条件.

最后, 基于轨迹的模拟(trace-driven emulation)[6, 10] 涉及在已部署的无线网络中录制轨迹, 并随后重放这些轨迹以复现录制时的网络条件.

基于轨迹的模拟的显著优势在于其捕捉真实网络条件的能力以及实验的可重复性. 研究人员可以多次运行相同的网络条件, 这不仅简化了应用或系统的调试, 还使得在相同网络条件下对不同应用或协议进行对比分析成为可能.

虽然已存在针对蜂窝网络 [10] 和 HTTP 流量 [6] 的基于轨迹的模拟器, 但据我们所知, 目前尚无针对 WiFi 的此类解决方案. 将蜂窝网络模拟改编用于 WiFi 并非易事, 因为 WiFi 与蜂窝网络在许多基本方面存在差异, 特别是在共享介质的访问管理, 对数据包丢失的反应以及其他特定技术协议方面. 这些差异导致现有的蜂窝网络模拟器无法准确模拟 WiFi 环境.

受基于轨迹的模拟优势及 WiFi 领域缺乏相关工具的启发, 我们设计并实现了 NemFi [1]: 一种针对 WiFi 的基于轨迹的模拟器. NemFi 扩展了最先进的蜂窝网络模拟器 [10], 以支持精确的 WiFi 模拟. 我们首先指出了 WiFi 基于轨迹模拟所面临的挑战. 这些挑战帮助我们确定了 NemFi 录制与重放(record-and-replay)模块的关键设计决策. 通过广泛的评估, 我们证明了 NemFi 能够在各种网络条件下准确地模拟 WiFi.

因此, 本文的主要贡献如下:

我们识别了针对 WiFi 进行基于轨迹的模拟所面临的挑战
我们介绍了一种新颖的 WiFi 基于轨迹的模拟器, 使得在模拟的 WiFi 条件下评估网络应用和服务成为可能

本文的其余部分安排如下. 在第 2 节中, 我们简要概述了现有的基于轨迹的蜂窝网络模拟器, 随后列出了设计 WiFi 基于轨迹模拟器所面临的挑战. 在第 3 节中, 我们阐述了 NemFi 的设计. 在第 4 节中, 我们通过针对各类应用和移动场景的一系列实验, 验证了 NemFi 设计的准确性. 我们在第 5 节总结全文并提出了未来的研究方向

Background and Motivation¶

This section first introduces the record-and-replay emulator for cellular networks developed by Winstein et al. [10] and then discusses the challenges to adapt this emulator to WiFi.

2.1 Trace-Driven Emulation for Wireless Networks¶

Winstein et al. [10] introduced a cellular network emulator to evaluate their new transport protocol for low-latency high-throughput transmission over wireless cellular networks. Their proposed emulator consists of two main modules: a record module, known as the Saturator, designed to record a trace of cellular network variability. This trace is then passed to the replay module, CellSim, which replays the trace to reproduce the captured network conditions. Concretely, the Saturator is a software module running on two end-hosts connected to each other via a cellular interface. During record, the Saturator aims to saturate the uplink and downlink channels by pushing MTU-sized UDP packets and recording the time each packet is received on the other end. These timestamps, also referred to as "delivery opportunities", represent the time MTUsized packets were able to effectively cross the cellular link (in each direction). During replay, Cellsim runs on a PC connected to two communicating end-hosts via Ethernet. CellSim listens for each incoming packet (in either direction), consults the trace of delivery opportunities, and delay packets accordingly to match the time packets were effectively delivered during the record.

The emulator proposed by Winstein et al. has been specifically designed to emulate cellular networks, and hence cannot accurately record and replay WiFi variability. In particular, in cellular networks, there are per device queues. If the bottleneck is the cellular network, then the congestion at the base-station is mostly self-induced and the effect of cross-traffic is muted. Moreover, in cellular networks, the uplink and downlink communications take place on different time slices and do not interfere with each other. In WiFi, on the other hand, the medium is shared and hence delivery opportunities are shared between the upstream and the downstream flows as well as with competing traffic. Further, LTE base-stations hold much larger queues than WiFi and allow for more re-transmissions. In WiFi, the queues are smaller and packet losses more common.

Winstein 等人引入了一种蜂窝网络模拟器, 用于评估其在无线蜂窝网络上进行低延迟, 高吞吐量传输的新型传输协议.

他们提出的模拟器包含两个主要模块:

一个称为"Saturator"(饱和器)的录制模块, 旨在记录蜂窝网络可变性的轨迹. 该轨迹随后被传递给回放模块"CellSim", 后者通过回放轨迹来复现所捕捉到的网络条件.

具体而言, Saturator 是运行在通过蜂窝接口互连的两个终端主机上的软件模块. 在录制期间, Saturator 旨在通过推送 MTU 大小的 UDP 数据包并记录每个数据包在另一端被接收的时间, 来使上行和下行信道同时达到饱和.

这些时间戳, 也被称为"传输机会"(delivery opportunities), 代表了 MTU 大小的数据包能够有效穿越蜂窝链路(在每个方向上)的时间点.

在回放期间, CellSim 运行在一台通过以太网连接两个通信终端主机的 PC 上. CellSim 监听每一个传入的数据包(无论方向), 查阅传输机会的轨迹, 并据此延迟数据包, 以匹配录制期间数据包被有效传输的时间.

Winstein 等人提出的模拟器是专为模拟蜂窝网络而设计的, 因此无法准确地录制和回放 WiFi 的可变性. 特别是, 在蜂窝网络中存在针对每个设备的队列. 如果瓶颈在于蜂窝网络, 那么基站处的拥塞大多是自身引起的(self-induced), 而交叉流量(cross-traffic)的影响则被削弱. 此外, 在蜂窝网络中, 上行和下行通信发生在不同的时间片上, 互不干扰. 相反, 在 WiFi 中, 介质是共享的, 因此传输机会在主要上行流和下行流之间, 以及与竞争流量之间是共享的. 此外, LTE 基站拥有比 WiFi 大得多的队列, 并允许更多的重传. 在 WiFi 中, 队列较小且丢包更为常见.

2.2 Challenges of WiFi Network Emulation¶

The design of a trace-driven emulator for WiFi brings a number of challenges.

Capturing WiFi delivery opportunities: As we have mentioned in the previous section, the Saturator captures cellular variability by saturating the uplink and the downlink simultaneously. While this approach works for cellular, we cannot adopt the same approach for WiFi. This is due to the difference in how cellular and WiFi manage access to the shared medium. In cellular different carrier frequency bands are dedicated for the uplink and downlink transmission [5]. However, in WiFi, the transmitting nodes contend for access to the shared medium. This means that if we saturate the uplink and downlink simultaneously, we capture the delivery opportunities on the upstream under the contention from the downstream transmission, and vice-versa. This is problematic in the case where the captured trace is used to evaluate the performance of applications that mostly push traffic in a single direction.

Identifying the saturation point: To capture the delivery opportunities, it is important to saturate the available channel capacity. The cellular emulator achieves this by employing a large window of packets-in-flight. Saturator adjusts the window size to keep the observed RTT between 750 milliseconds and 3 seconds. Using such a large window size ensures that the Saturator has a persistent queue of packets at the bottleneck link, which in return means that it is effectively saturating the link. Further, by setting a threshold on the window-size, the Saturator ensures that it will not overflow the network queue and thus induce packet loss. Adopting such a large threshold is possible in cellular because cellular employs large queues to deal with rapidly changing network variability and multi-second outages [10]. However, we cannot employ a similar approach in WiFi for two main reasons: (1) WiFi employs a much smaller queue than cellular, 2) the maximum bitrate available for a WiFi client (i.e. PHY rate) is not fixed and is affected by channel conditions and WiFi losses. The latter means that to effectively capture the available bandwidth in WiFi we need to dynamically adjust the window size to adapt to the changing PHY rates. Failure to do so, may cause the record to overflow the network queue and thereby induce packet losses, which could potentially cause the WiFi network to further reduce the PHY rate. Figure 1a demonstrates the effect of using a larger static threshold on the observed packet loss rate in WiFi. These results were obtained by running the Saturator on an end-device connected to another end-host via WiFi in a controlled setup with ideal WiFi conditions (no competing traffic, end-host in close proximity to the WiFi access-point). In Figure 1a, we observe that while Saturator has already reached saturation point (steady-state) under one second, the packet losses keep increasing and reach 30% of all packets sent within 90 seconds. Note that due to the overhead of control packets Saturator's throughput is only 85% of the PHY rate.

Capturing WiFi losses: WiFi losses are more common in practice than cellular, because WiFi employs a smaller packet buffer and a fewer number of L2-retransmissions. To accurately reproduce WiFi variability, it is important to capture and replay WiFi losses. Failure to do so will make all recorded WiFi traces appear lossless. This will impact the performance of applications replayed over these traces as the effect of WiFi losses is absent. The importance of capturing WiFi losses further highlights the need to avoid inducing packet loss during the record as it raises the challenge of isolating losses due to buffer overflow from WiFi losses.

Emulating WiFi-specific features: It is essential to account for other WiFi-specific features like frame aggregation as the client is capable of achieving a much higher link utilization by sending frames in batches. This feature is at the heart of improvements in recent WiFi standards like IEEE 802.11 n, ac, and ax. Frame aggregation could be A-MSDU or A-MPDU with varying parameters adopted by devices. Hence, we require a solution that does consider these factors.

设计针对 WiFi 的轨迹驱动模拟器带来了一系列挑战.

(1) 捕捉 WiFi 传输机会:

如前一节所述, Saturator 通过同时使上行和下行链路饱和来捕捉蜂窝网络的可变性. 虽然这种方法适用于蜂窝网络, 但我们不能在 WiFi 中采用相同的方法. 这是由于蜂窝网络和 WiFi 在管理共享介质访问方式上的差异所致

在蜂窝网络中, 不同的载波频段专用于上行和下行传输

然而, 在 WiFi 中, 发送节点需要竞争共享介质的访问权

这意味着 如果我们同时使上行和下行链路饱和, 我们捕捉到的上行传输机会是在受到下行传输竞争影响下的结果, 反之亦然

当捕捉到的轨迹被用于评估主要在单一方向上推送流量的应用程序性能时, 这就成为了一个问题

(2) 识别饱和点:

为了捕捉传输机会, 使可用信道容量达到饱和至关重要.

蜂窝网络模拟器通过使用较大的在途数据包(packets-in-flight)窗口来实现这一点. Saturator 调整窗口大小, 以保持观测到的往返时间(RTT)在 750 毫秒到 3 秒之间. 使用如此大的窗口大小可确保 Saturator 在瓶颈链路处拥有持久的数据包队列, 这反过来意味着它有效地使链路饱和

此外, 通过设置窗口大小的阈值, Saturator 确保不会使网络队列溢出从而导致丢包

在蜂窝网络中采用如此大的阈值是可能的, 因为蜂窝网络使用大队列来应对快速变化的网络可变性和长达数秒的中断

然而, 出于两个主要原因, 我们无法在 WiFi 中采用类似的方法:

WiFi 使用的队列比蜂窝网络小得多
WiFi 客户端可用的最大比特率(即 PHY 速率)不是固定的, 且受信道条件和 WiFi 丢包的影响

后者意味着: 为了有效捕捉 WiFi 中的可用带宽, 我们需要动态调整窗口大小以适应不断变化的 PHY 速率

如果未能做到这一点, 可能会导致录制过程使网络队列溢出, 从而引发丢包, 这可能潜在地导致 WiFi 网络进一步降低 PHY 速率

图 1a 展示了在 WiFi 中使用较大的静态阈值对观测到的丢包率的影响:

alt text

这些结果是在受控设置下通过 WiFi 将运行 Saturator 的终端设备连接到另一台终端主机获得的, 该设置具有理想的 WiFi 条件(无竞争流量, 终端主机靠近 WiFi 接入点)

在图 1a 中, 我们观察到虽然 Saturator 在一秒内已达到饱和点(稳态), 但丢包率持续增加, 并在 90 秒内达到所有发送数据包的 30%. 请注意, 由于控制数据包的开销, Saturator 的吞吐量仅为 PHY 速率的 85%

(3) 捕捉 WiFi 丢包:

在实践中, WiFi 丢包比蜂窝网络更为常见, 因为 WiFi 使用较小的数据包缓冲区且 L2 重传次数较少.

为了准确复现 WiFi 的可变性, 捕捉并回放 WiFi 丢包至关重要. 如果未能做到这一点, 将使所有录制的 WiFi 轨迹看起来都是无损的.

这将影响在这些轨迹上回放的应用程序的性能, 因为缺乏 WiFi 丢包的影响.

捕捉 WiFi 丢包的重要性进一步凸显了避免在录制期间人为导致丢包的必要性, 因为这增加了将缓冲区溢出导致的丢包与 WiFi 自然丢包区分开来的难度.

(4) 模拟 WiFi 特有功能:

必须考虑帧聚合(frame aggregation)等其他 WiFi 特有功能, 因为客户端能够通过批量发送帧来实现更高的链路利用率.

这一特性是近期 WiFi 标准(如 IEEE 802.11 n, ac 和 ax)改进的核心. 帧聚合可以是 A-MSDU 或 A-MPDU, 且设备采用的参数各不相同.

因此, 我们需要一种能够考虑这些因素的解决方案.

Nemfi: Design and Implementation¶

We begin by providing a brief overview of NemFi's system design. NemFi extends the state-of-the-art trace-driven emulator for cellular to support WiFi emulation. NemFi records WiFi packet delivery opportunities in a trace that can be later replayed to emulate the recorded WiFi conditions. To achieve this, NemFi is designed with two main components: a record module and a replay module. NemFi's record module records WiFi network variability by capturing the time MTU-sized packets were able to effectively cross the WiFi link, as well as the time packets were dropped due to WiFi losses. These traces of packet deliveries and packet losses are then passed to the replay module. NemFi's replay module is built on top of Mahimahi's [6], a framework designed to record-and-replay HTTP traffic. Similar to MahiMahi's replay module, NemFi is built as a Unix shell. When an application is running inside the shell, all of its incoming and outgoing packets will be intercepted and placed inside a queue. These packets will be first delayed for a fixed amount of time to emulate one-way propagation delays, and then the packet-delivery and packet-loss traces will be inspected to determine the fate of each packet. Either the packet will be dropped because it coincides with a packet loss event, or the packet will be released. In the latter case, the packet will be released from the queue to match the recorded packet-delivery trace.

CellSim and MahiMahi's replayshell are identical in how they replay the packet-delivery trace, however, the key difference is that MahiMahi is built using Unix shells, whereas CellSim requires to run on a dedicated machine connected to both the client and the server during replay. For this reason we opted to use MahiMahi's replayShell, since it's easier to use in practice and requires less hardware for the replay.

我们首先简要概述 NemFi 的系统设计. NemFi 扩展了现有的最先进蜂窝网络轨迹驱动模拟器, 以支持 WiFi 模拟. NemFi 将 WiFi 数据包传输机会记录在轨迹中, 随后可回放该轨迹以模拟所记录的 WiFi 条件. 为实现这一目标, NemFi 的设计包含两个主要组件: 一个录制模块和一个回放模块

NemFi 的录制模块通过捕捉 MTU 大小的数据包有效穿越 WiFi 链路的时间, 以及数据包因 WiFi 丢包而被丢弃的时间, 来记录 WiFi 网络的可变性. 这些关于数据包传输和丢包的轨迹随后被传递给回放模块.

NemFi 的回放模块构建在 Mahimahi [6] 之上, 这是一个设计用于录制和回放 HTTP 流量的框架

与 Mahimahi 的回放模块类似, NemFi 被构建为一个 Unix Shell:

当应用程序在该 Shell 中运行时, 其所有传入和传出的数据包都会被拦截并放入队列中

这些数据包首先会被延迟一段固定的时间以模拟单向传播延迟, 随后系统会检查数据包传输和丢包轨迹以决定每个数据包的命运

数据包要么因为遭遇丢包事件而被丢弃, 要么被释放. 在后一种情况下, 数据包将根据记录的数据包传输轨迹从队列中释放

CellSim 和 Mahimahi 的 replayshell 在回放数据包传输轨迹的方式上是相同的, 但关键区别在于 Mahimahi 是使用 Unix Shell 构建的, 而 CellSim 在回放期间需要在连接客户端和服务器的专用机器上运行. 出于这个原因, 我们选择使用 Mahimahi 的 replayShell, 因为它在实践中更易于使用, 且回放所需的硬件更少

3.1 NemFi's record¶

The goal of the record is to capture the variability of a WiFi link over time. To achieve this, NemFi's record module runs on two machines: a sender and a receiver (as illustrated in Figure 2). The sender is connected to the receiver via two links: the WiFi link being measured and a reliable link for feedback. A separate reliable link provides a quick feedback loop to allow our record tool to quickly adapt its sending rate, and avoid over-filling the network queue. As mentioned in Section 2.2, the key challenges of designing a record mechanism for WiFi are: (1) How do we accurately capture the delivery opportunities on the uplink and downlink simultaneously given the effect of contention? (2) How do identify when we have saturated the channel? Recall that identifying the saturation point is key to prevent introducing packet losses, which may cause WiFi to reduce the sending rate. In the remainder of this section, we explain how we modify the Saturator to address these two concerns.

3.1.1 Capturing WiFi delivery opportunities: Given the limitations of saturating the WiFi uplink and downlink simultaneously, we modified the Saturator to run the saturation in a single direction only. The intuition behind this decision is that by saturating in a single direction, we omit the effect of contention and are able to capture all the delivery opportunities available on the WiFi link overtime. This trace provides us with all the information we need to then emulate the WiFi link accurately for any type of application or service (regardless of their transmission mode). The next question is how to allocate the delivery opportunities between the uplink and downlink during replay. To answer this question, we conducted a set of experiments to understand how bandwidth is distributed between uplink and downlink in WiFi. Figure 1b illustrate our results. We ran the Saturator while varying the window size between 4000 to 8000 MTUs, and for each window size value, we run the Saturator to saturate the channel in both directions (denoted as Bi-di in Figure 1b) or a single direction (uplink only). Our results show that in all cases there is a fair share of delivery opportunities between the uplink and the downlink streams. Further, we observe that the throughput we achieve while saturating in a single direction matches the sum of throughput in each direction. This result indicates that by recording the delivery opportunities in a single direction, we are able to then emulate any combinations of uplink-downlink transmissions by employing fair resource sharing during replay.

3.1.2 Identifying Saturation point: The next challenge that we need to address is that of identifying the saturation point, which is represented in WiFi by the PHY rate. The PHY rate indicates the maximal bitrate available between the wireless client and the access point. The issue is that this value is dynamic since WiFi adapts the PHY rate based on the wireless network conditions. Hence to address the problem of dynamic saturation point we need to measure the PHY rate overtime and quickly adapt the Saturator's sending rate to match the PHY rate as it changes. We measure the PHY rate by extracting station dump information from the client WiFi driver using Linux utility iw every 25 ms. With repeated experimentation, we found that this frequency strikes a good balance between getting a good estimate of the maximum bitrate overtime without inducing too much run-time overhead, due to continuous routine calls to the driver, which could interfere with the saturation.

To adapt the window size to match the PHY rate overtime, we equip NemFi with a rate control algorithm, presented in Algorithm 1. We start by setting the window size to 5. NemFi's rate control algorithm is then periodically invoked with the last PHY rate reading and the current Saturator's sending rate as input. The goal of the rate-control algorithm is to decide whether to increase or decrease the window size, and by how much, to match the latest PHY rate value. To achieve this, we follow an approach similar to proportional–integral–derivative controller that is widely used in industrial control systems [4]. We set the last PHY rate value as the "theoretical bound". Next, we compute the difference between the Saturator's current throughput and the theoretical bound, and denote this value as the 'error' i.e. how far we are from the theoretical bound. The error is then used to determine by how much we need to increase the window size The window size is 𝑒𝑟𝑟𝑜𝑟 ∗ 𝛼, where 𝛼 is a constant that controls how aggressively we approach the PHY rate. We set 𝛼 as 𝑡ℎ𝑒𝑜𝑟𝑒𝑡𝑖𝑐𝑎𝑙𝐵𝑜𝑢𝑛𝑑/1500. Our intuition behind setting 𝛼 proportional to the current PHY rate is that the higher the PHY rate, the larger the window size increments should be to quickly converge to the saturation point. Further, our empirical results have shown that by setting the denominator to 1500, we strike a good balance between quickly converging to the PHY rate while minimizing the risk of overshooting.

Next, we need to figure out if we reached the saturation point. To achieve this, we define the differential throughput gain as the difference between the Saturator's current and last throughput value. While the differential gain is positive, we keep increasing the window. Once the differential gain becomes zero, it means we reached the saturation point, and we set the saturation flag to true to avoid increasing the window size any further.

The final task of the rate control algorithm is to adapt the window size in the event of PHY rate change. When we observe a drop in PHY rate due to deteriorating channel conditions, we decrease the window size to 80% and turn the saturation flag to false. It causes the algorithm to get to a new saturation state by re-adapting the window size. An aggressive drop in the window size would result in more transient time in reaching the next saturation state. We observe by repeated experiments that with 80% drop, we are reasonably quick in getting to the next saturation state even in extremely mobile scenarios. Similarly, we assign the saturation flag as false on noticing an improvement in PHY rate to increase the window size further.

At the end of the record phase, we obtain a trace where each entry comprises of: the timestamp and the sequence number of successfully sent packets, the throughput and the percent loss rate at that instant along with the PHY rate and the window size. For the replay, we process the recorded-trace to give a new trace as input. The replay-trace entries comprise of: the delivery opportunity, the throughput, the sequence number, and the percent loss rate. We covert the timestamps in recorded trace to a time-series in milliseconds from 0 to the duration of the record, which serves as packet delivery opportunities during the replay.

录制的目标是捕捉 WiFi 链路随时间变化的可变性. 为实现这一目标, NemFi 的录制模块运行在两台机器上: 发送端和接收端(如图 2 所示). 发送端通过两条链路连接到接收端: 一条是被测量的 WiFi 链路, 另一条是用于反馈的可靠链路.

alt text

独立的可靠链路提供了一个快速反馈回路, 允许我们的录制工具快速调整其发送速率, 避免过度填充网络队列. 如第 2.2 节所述, 设计 WiFi 录制机制的关键挑战在于: (1) 鉴于竞争的影响, 我们如何同时准确捕捉上行和下行链路的传输机会? (2) 我们如何识别何时使信道达到饱和? 回想一下, 识别饱和点对于防止引入数据包丢失至关重要, 因为这可能导致 WiFi 降低发送速率. 在本节的其余部分, 我们将解释如何修改 Saturator 以解决这两个问题.

3.1.1 捕捉 WiFi 传输机会:

鉴于同时使 WiFi 上行和下行链路饱和的局限性, 我们修改了 Saturator 以仅在单一方向上进行饱和操作.

这一决策背后的直觉是, 通过单向饱和, 我们消除了竞争的影响, 从而能够捕捉随时间推移 WiFi 链路上可用的所有传输机会. 该轨迹为我们提供了所需的所有信息, 以便随后针对任何类型的应用程序或服务(无论其传输模式如何)准确地模拟 WiFi 链路.

接下来的问题是如何在回放期间在上行和下行链路之间分配传输机会.

为了回答这个问题, 我们进行了一组实验, 以了解带宽如何在 WiFi 的上行和下行链路之间分配.

图 1b 展示了我们的结果. 我们运行 Saturator, 同时将窗口大小在 4000 到 8000 MTU 之间变化, 对于每个窗口大小值, 我们运行 Saturator 以使双向信道(在图 1b 中表示为 Bi-di)或单向(仅上行)信道饱和.

我们的结果表明, 在所有情况下, 上行和下行流之间都存在传输机会的公平共享. 此外, 我们观察到, 单向饱和时获得的吞吐量与双向吞吐量之和相匹配. 这一结果表明, 通过记录单向传输机会, 我们能够在回放期间采用公平资源共享机制来模拟任何上行-下行传输组合.

3.1.2 识别饱和点:

我们需要解决的下一个挑战是识别饱和点, 在 WiFi 中这由 PHY 速率表示. PHY 速率指示了无线客户端与接入点之间可用的最大比特率.

问题在于该值是动态的, 因为 WiFi 会根据无线网络条件调整 PHY 速率. 因此, 为了解决动态饱和点的问题, 我们需要随时间测量 PHY 速率, 并快速调整 Saturator 的发送速率以匹配变化的 PHY 速率.

我们通过使用 Linux 工具 iw 每 25 毫秒从客户端 WiFi 驱动程序中提取站点转储信息来测量 PHY 速率. 通过反复实验, 我们发现该频率在获得随时间变化的最大比特率的良好估计与不引入过多的运行时开销之间取得了良好的平衡, 过多的开销是由于对驱动程序的持续例行调用可能干扰饱和过程所致.

为了随时间调整窗口大小以匹配 PHY 速率, 我们为 NemFi 配备了一个速率控制算法, 如算法 1 所示. 我们首先将窗口大小设置为 5. 然后定期调用 NemFi 的速率控制算法, 输入为最新的 PHY 速率读数和当前 Saturator 的发送速率. 速率控制算法的目标是决定是否增加或减少窗口大小, 以及调整多少, 以匹配最新的 PHY 速率值. 为实现这一目标, 我们采用了一种类似于在工业控制系统中广泛使用的比例-积分-微分(PID)控制器的方法. 我们将上一次的 PHY 速率值设定为"理论界限"(theoretical bound). 接下来, 我们计算 Saturator 当前吞吐量与理论界限之间的差值, 并将此值表示为"误差"(error), 即我们距离理论界限有多远. 然后使用该误差来确定我们需要增加多少窗口大小. 窗口大小调整量为 error x \(\alpha\), 其中 \(\alpha\) 是一个控制我们接近 PHY 速率积极程度的常数. 我们将 \(\alpha\) 设置为 𝑡ℎ𝑒𝑜𝑟𝑒𝑡𝑖𝑐𝑎𝑙𝐵𝑜𝑢𝑛𝑑/1500. 我们将 \(\alpha\) 设置为与当前 PHY 速率成正比的直觉是, PHY 速率越高, 窗口大小的增量应该越大, 以便快速收敛到饱和点. 此外, 我们的经验结果表明, 通过将分母设置为 1500, 我们在快速收敛到 PHY 速率与最小化超调风险之间取得了良好的平衡.

接下来, 我们需要弄清楚是否已达到饱和点. 为此, 我们将差分吞吐量增益(differential throughput gain)定义为 Saturator 当前吞吐量值与上一次吞吐量值之间的差值:

当差分增益为正时, 我们继续增加窗口
一旦差分增益变为零, 意味着我们达到了饱和点, 我们将饱和标志设置为 true, 以避免进一步增加窗口大小

速率控制算法的最终任务是在 PHY 速率发生变化时调整窗口大小. 当我们观察到由于信道条件恶化导致 PHY 速率下降时, 我们将窗口大小减少到 80% 并将饱和标志设为 false. 这会导致算法通过重新适应窗口大小来达到新的饱和状态. 窗口大小的激进下降会导致到达下一个饱和状态的过渡时间更长. 我们通过反复实验观察到, 随着 80% 的下降, 即使在极端移动场景下, 我们也能相当快地达到下一个饱和状态. 同样, 当我们注意到 PHY 速率提高时, 我们将饱和标志指定为 false, 以进一步增加窗口大小.

在录制阶段结束时, 我们获得一个轨迹, 其中每个条目包含: 成功发送数据包的时间戳和序列号, 该时刻的吞吐量和丢包率百分比, 以及 PHY 速率和窗口大小

对于回放, 我们处理录制的轨迹以生成新的输入轨迹. 回放轨迹条目包含: 传输机会, 吞吐量, 序列号和丢包率百分比. 我们将录制轨迹中的时间戳转换为从 0 到录制持续时间的毫秒级时间序列, 作为回放期间的数据包传输机会.

3.2 NemFi's replay¶

The goal of NemFi's replay tool is to retrace the same time-based conditions recorded in the trace with packet deliveries and packet losses. There are three main challenges that we need to address for replaying WiFi: 1) Sharing opportunities between the uplink and the downlink, 2) Replaying WiFi losses, and 3) Emulating frame aggregation.

Firstly, we need to share delivery opportunities between the uplink and the downlink flows as they communicate on a shared spectrum in WiFi. This is not present in MahiMahi's linkshell as it is designed for cellular, where uplink and downlink traffic are allocated to different time slices. Hence, we introduce a weighted round-robin like approach to share the delivery opportunities between the uplink and the downlink. Concretely, we define percentShare to represent the uplink share of the medium; where 1 - percentShare 2 represents the downlink share. At the time of the next delivery opportunity, NemFi generates a random number between 0 and 1. If the number is smaller than 1 - percentShare and the downlink queue is not empty, the downlink queue seizes the delivery opportunity. If the dowlink queue is empty, and the uplink queue is not, the uplink queue seizes the delivery opportunity. The process repeats similarly for all subsequent delivery opportunities. To emulate propagation delays, we introduce a constant delay of half the round-trip time on both the uplink and downlink packet queues.

The next challenge is to emulate WiFi losses. We use the instantaneous loss rate at the current index of the packet delivery trace when a packet is being read from the socket into the corresponding packet queue. If the loss rate turns out to be positive, we generate a random number between 0 and 1. If this number is less than the current loss rate, then we drop the packet read from the socket and that packet is not appended to either of the packet queues.

Finally, to emulate frame aggregation, we adopt the model introduced by da Hora et al. [2]. Concretely, we estimate the total number of aggregated frames at different PHY rates that we observe in each entry of the packet-delivery trace. Using the model [2], we estimate the frame aggregation per PHY rate for the specific AP and stations used in the experimental setup(Section 4.1). This gives us an inference of the number of packets to be grouped together for sending it to the output queue. We additionally observe that all packets which are part of the same aggregate have practically the same delivery timestamps. We do not group the whole set of delivery opportunities when we have missing sequence numbers, due to the losses in the aggregated frame during the record phase. On detecting the missing sequence number in the input trace, we do not group more packets in the same emulation opportunity.

NemFi 回放工具的目标是通过数据包传输和丢包来回溯轨迹中记录的相同基于时间的条件. 回放 WiFi 面临三个主要挑战: 1) 在上行和下行链路之间共享传输机会, 2) 回放 WiFi 丢包, 以及 3) 模拟帧聚合.

首先, 我们需要在上行和下行流之间共享传输机会, 因为它们在 WiFi 中通过共享频谱进行通信:

这在 Mahimahi 的 linkshell 中是不存在的, 因为它是为蜂窝网络设计的, 其中上行和下行流量被分配到不同的时间片. 因此, 我们引入了一种类似加权轮询的方法来在上行和下行链路之间共享传输机会.

具体而言, 我们定义 percentShare 来表示介质的上行份额; 其中 1 - percentShare 表示下行份额. 在下一个传输机会到来时, NemFi 生成一个 0 到 1 之间的随机数. 如果该数字小于 1 - percentShare 且下行队列不为空, 则下行队列占用该传输机会. 如果下行队列为空, 且上行队列不为空, 则上行队列占用该传输机会. 对于所有后续的传输机会, 该过程类似地重复. 为了模拟传播延迟, 我们在上行和下行数据包队列中引入了半个往返时间的恒定延迟.

下一个挑战是模拟 WiFi 丢包

当从套接字读取数据包进入相应的数据包队列时, 我们使用数据包传输轨迹当前索引处的瞬时丢包率. 如果丢包率结果为正, 我们生成一个 0 到 1 之间的随机数. 如果该数字小于当前丢包率, 则我们丢弃从套接字读取的数据包, 该数据包不会被追加到任一数据包队列中.

最后, 为了模拟帧聚合, 我们采用了 da Hora 等人 [2] 引入的模型

具体而言, 我们估计了在数据包传输轨迹的每个条目中观察到的不同 PHY 速率下的聚合帧总数. 使用该模型 [2], 我们针对实验设置中使用的特定 AP 和站点(第 4.1 节), 估计每个 PHY 速率的帧聚合情况. 这为我们要分组发送到输出队列的数据包数量提供了推断依据. 我们还观察到, 属于同一聚合的所有数据包实际上具有相同的传输时间戳. 当由于录制阶段聚合帧中的丢失导致序列号缺失时, 我们不会对整组传输机会进行分组. 在检测到输入轨迹中的序列号缺失时, 我们不会在同一模拟机会中对更多数据包进行分组.

Conclusion¶

In this paper, we introduced NemFi a novel trace-driven emulator for WiFi. We identified a number of challenges that need to be addressed to develop an accurate record-and-replay tool for WiFi, and we demonstrate how NemFi addressed these challenges. Our evaluations show that NemFi accurately captures WiFi conditions by capturing the variability in delivery opportunities and WiFi losses. We hope that by releasing NemFi others will be able to build on our work both by improving and further evaluating NemFi as well as by using it to evaluate networked systems over emulated WiFi. In particular, our evaluation focused on indoor scenarios with limited mobility, hence further evaluation is required for using NemFi in other scenarios.

在本文中, 我们介绍了 NemFi: 一种新型的针对 WiFi 网络的基于轨迹的模拟器

我们明确了开发精确的 WiFi 录制-回放工具所需解决的一系列挑战, 并展示了 NemFi 是如何应对这些挑战的

评估结果表明, NemFi 通过捕捉传输机会的可变性以及 WiFi 丢包情况, 能够准确地复现 WiFi 网络条件. 我们希望通过发布 NemFi, 能够让其他研究者在我们的工作基础上继续探索, 既可以对 NemFi 进行改进和进一步评估, 也可以利用它在模拟的 WiFi 环境中评估网络系统. 需要特别指出的是, 本文的评估主要侧重于移动性受限的室内场景, 因此若要在其他场景中应用 NemFi, 尚需进行进一步的评估验证.