Background¶
In this section, we first provide a background on discrete event network simulation (§2.1). We then discuss two acceleration methods: newly emerged data-driven approaches (§2.2) and existing PDES algorithms (§2.3).
DES for Networks¶
In network simulation, a network topology is modeled as a graph with nodes and links. Each node represents a host or a switch. The sending, forwarding, and receiving of a packet between nodes is modeled as a series of discrete events stored in the future event list [34], as illustrated in Figure 2.
Discrete event. In network simulation, a discrete event consists of a timestamp, a node ID, and a callback function, indicating when, where, and what happens respectively. In the callback function, state variables of the corresponding node (e.g., queue length) may be modified, and new events may be scheduled into the future event list [10]. The processing time of a discrete event is determined by the complexity of its callback function.
离散事件。在网络仿真中,离散事件由时间戳、节点ID和回调函数组成,分别表示何时、何地以及发生什么。在回调函数中,可能会修改相应节点的状态变量(例如队列长度),并可能将新事件安排到未来事件列表中 [10]。离散事件的处理时间取决于其回调函数的复杂性。
Future event list. To maintain a correct order of events, the simulator uses a future event list (FEL) to store discrete events. The FEL is a priority queue, with the event having the smallest timestamp at the head of the queue [10].
未来事件列表。为了保持事件的正确顺序,仿真器使用未来事件列表(FEL)来存储离散事件。FEL是一个优先级队列,队列头部是时间戳最小的事件 [10]。
Sequential simulation. As the simulator runs, it pops an event from the head of the FEL and processes it. When the event is processed, the simulator advances its clock to the timestamp of the event [10]. The simulator terminates if it encounters a stop event or if the FEL is empty (i.e., all events have been processed). Since events in the FEL are ordered by timestamps, causality and correctness are guaranteed.
However, the sequential approach is depressingly slow when simulating large-scale networks. For example, there are over 550 million events per simulated second for a 𝑘 = 8 fat-tree with 100Gbps links, which takes nearly 2 days to simulate for only 1 second based on our observation.
顺序仿真。随着仿真器运行,它从FEL的队列头部弹出一个事件并处理。当事件被处理时,仿真器将其时钟推进到事件的时间戳 [10]。如果遇到停止事件或FEL为空(即所有事件都已被处理),仿真器终止。由于FEL中的事件按时间戳排序,因果关系和正确性得到了保证。
然而,当仿真大规模网络时,顺序方法的速度令人沮丧。例如,对于具有100Gbps链路的𝑘=8 fat-tree,每模拟秒有超过5.5亿个事件,根据我们的观察,这仅模拟1秒就需要近2天的时间。
Data-Driven Approaches¶
Different from the full-fidelity DES, data-driven simulators such as MimicNet [42] and DeepQueueNet [40] use ML to approximate behaviors of large networks. They replace a set of nodes in the simulated network with a simplified black box to be trained. In the black box, deep neural networks (DNN) are used to approximate the protocol stack. The training data is generated via a small-scale simulation or by collecting traces from physical devices. However, the applicability and performance of these approaches are still limited.
与全保真DES不同,数据驱动的仿真器如MimicNet [42]和DeepQueueNet [40]使用机器学习来近似大型网络的行为。它们将仿真网络中的一组节点替换为一个经过训练的简化黑箱。在这个黑箱中,使用深度神经网络(DNN)来近似协议栈。训练数据通过小规模仿真或从物理设备收集的轨迹生成。然而,这些方法的适用性和性能仍然有限。
Limited usability. Due to the simplification of the black box, these data-driven approaches have limited use cases compared with DES. For MimicNet, it is only applicable to fat-tree topologies [42]. For DeepQueueNet, it ignores state variables of the protocol stack, making it unable to simulate stateful behaviors such as TCP congestion control [40]. Moreover, they are only applicable to networks at the stable point. MimicNet cannot model skewed traffic between fat-tree clusters [42] while DeepQueueNet cannot reflect temporal dynamic behaviors of networks [40].
有限的适用性。由于黑箱的简化,与DES相比,这些数据驱动方法的使用场景有限。对于MimicNet,它仅适用于fat-tree拓扑 [42]。对于DeepQueueNet,它忽略了协议栈的状态变量,无法模拟如TCP拥塞控制等状态行为 [40]。此外,它们仅适用于稳定点上的网络。MimicNet无法模拟fat-tree集群间的倾斜流量 [42],而DeepQueueNet无法反映网络的时间动态行为 [40]。
Long training time. In addition to their limited usability, training the DNN in the black box also takes a long time. For MimicNet, it takes about 7 hours to train a single fat-tree cluster. When the workload or the structure of the cluster changes, the model must be retrained [42]. For DeepQueueNet, training a single type of switch can take 12 hours, yet all types of switches in the model must be trained before simulation [40].
长时间的训练时间。除了适用性有限外,训练黑箱中的DNN也需要很长时间。对于MimicNet,训练一个单一的fat-tree集群需要大约7小时。当工作负载或集群结构变化时,模型必须重新训练 [42]。对于DeepQueueNet,训练单一类型的交换机可能需要12小时,而在仿真前必须训练模型中的所有交换机类型 [40]。
Still relying on DES. For machine learning, sufficient training data is required to achieve satisfactory accuracy, but it is costly to obtain such data from real physical devices. Most new scenarios, such as implementing a new switch or designing a new protocol, do not have such previously known data. As a result, these data-driven methods still rely on DES to obtain labeled training and testing data. MimicNet uses DES to simulate a cluster with full fidelity to train other mimics [42], while DeepQueueNet uses ns.py to obtain training data for its switches [40]. Therefore, DES is still an irreplaceable tool in network study. Improving the performance of DES can further facilitate these data-driven approaches.
仍然依赖于DES。对于机器学习,充足的训练数据是取得满意准确度的前提,但从真实物理设备获取这些数据成本高昂。大多数新场景,如实现新交换机或设计新协议,没有已知的先前数据。因此,这些数据驱动方法仍然依赖于DES来获取标注的训练和测试数据。MimicNet使用DES全保真模拟一个集群来训练其他模拟器 [42],而DeepQueueNet使用ns.py获取其交换机的训练数据 [40]。因此,DES在网络研究中仍是不可替代的工具。提高DES的性能可以进一步促进这些数据驱动方法的发展。
PDES Algorithms¶
The sequential DES can be parallelized by dividing the network topology spatially into logical processes (LPs). Figure 3 is an example of dividing a 𝑘 = 4 fat-tree into 4 LPs. Each LP runs the sequential DES with its own FEL. A synchronization mechanism is required to ensure that the causality (i.e., the order of timestamps) of events is not violated [10, 23].
顺序DES可以通过将网络拓扑空间划分为逻辑进程(LPs)来并行化。图3展示了将一个 \(k = 4\) fat-tree划分为4个LP的例子。每个LP使用自己的FEL运行顺序DES。需要一种同步机制来确保事件的因果关系(即时间戳顺序)不被破坏 [10, 23]。
In network simulation, due to the inevitable propagation delay of links, there is always a time window during which packets are on the wire. For the receiving LP, processing events in its FEL within the time window (i.e., before these packets arrive) does not violate the causality of these events, as illustrated in Figure 4. Based on this concept, a lookahead value can be calculated for each LP, which is the shortest delay of all links connected to other LPs [10, 23].
在网络仿真中,由于链路不可避免的传播延迟,始终存在一个时间窗口,在此期间,数据包在链路上传输。对于接收LP,在时间窗口内处理其FEL中的事件(即在这些数据包到达之前)不会破坏这些事件的因果关系,如图4所示。基于这一概念,可以为每个LP计算一个前瞻值(lookahead value),它是连接到其他LP的所有链路的最短延迟 [10, 23]。
The default PDES algorithm implemented in ns-3 is the barrier synchronization algorithm [10, 34]. In this algorithm, all LPs are executed in rounds separated by barriers. At the start of each round, each LP calculates its time window, called the lower bound on the time stamp (LBTS) as
\(𝐿𝐵𝑇𝑆 = min {𝑁_𝑖} + 𝑙𝑜𝑜𝑘𝑎ℎ𝑒𝑎𝑑\)
where \(𝑁_𝑖\) represents the timestamp of the next event of the 𝑖-th LP. LPs can safely execute events whose timestamps do not exceed the LBTS. Then, all LPs have to perform a barrier synchronization, waiting for other LPs to complete before entering the next round [10], as shown in Figure 4.
ns-3中实现的默认PDES算法是屏障同步算法 [10, 34]。在该算法中,所有LP通过屏障分隔的轮次执行。在每一轮开始时,每个LP计算其时间窗口,称为时间戳下界(LBTS),计算公式为:......
其中,\(𝑁_𝑖\) 表示第 \(𝑖\) 个LP的下一个事件的时间戳。LP可以安全地执行时间戳不超过LBTS的事件。然后,所有LP必须执行屏障同步,等待其他LP完成后才能进入下一轮 [10],如图4所示。
In addition to this algorithm, OMNeT++ and ns-3 also implement PDES with the null message algorithm [7]. In this algorithm, LPs are synchronized locally via null messages instead of global barriers. Similar to the barrier synchronization algorithm, this algorithm uses the lookahead value to ensure causality as well.
除了屏障同步算法,OMNeT++和ns-3还实现了使用空消息算法(null message algorithm)的PDES [7]。在该算法中,LP通过空消息而不是全局屏障进行本地同步。与屏障同步算法类似,该算法也使用前瞻值来确保因果关系。
Another approach for PDES is to allow for the violation of causality while providing a rollback mechanism [10, 20]. However, it requires a significant re-architecture of existing simulators due to the implementation of state saving for rollbacks or user-provided rollback methods [20], which is also not user-transparent for the latter case. Most network simulators including ns-3 and OMNeT++ do not implement this approach. Therefore, we focus on the previous two algorithms for optimization and comparison.
另一种PDES方法是允许因果关系的违反,同时提供回滚机制 [10, 20]。然而,由于需要实现状态保存以进行回滚或用户提供的回滚方法,这需要对现有仿真器进行重大架构重构 [20],而在后一种情况下,这也不是用户透明的。大多数网络仿真器(包括ns-3和OMNeT++)都没有实现这种方法。因此,我们重点对前两种算法进行优化和比较。