跳转至

Discussion and Conclusion

Discussion

Memory Overhead. In addition to the parallel efficiency and user transparency, Unison also reduces the memory overhead of PDES. This is because the network topology and flow information is shared among LPs via multithreading. Therefore, the memory usage of Unison is comparable with the default sequential DES.

内存开销。 除了并行效率和用户透明性,Unison 还减少了 PDES 的内存开销。这是因为网络拓扑和流信息通过多线程在 LPs 之间共享。因此,Unison 的内存使用量与默认的顺序 DES 相当。

Applicability and Generality. One of the limitations on the applicability of Unison is that, it cannot handle models that only contain stateful links such as wireless channels, since they cannot be cut off for fine-grained partition. In addition, for a large model with a low traffic load, the speedup of Unison is less significant, which is about the same as other PDES approaches. However, it is fast to simulate such a model even with sequential DES due to a small number of events and a small degree of parallelism.

适用性和通用性。 Unison 在适用性方面的一个限制是,它无法处理仅包含有状态链路(如无线信道)的模型,因为这些链路无法被切断以进行细粒度分区。此外,对于负载较低的大型模型,Unison 的加速效果不太显著,其性能与其他 PDES 方法相差无几。然而,由于事件数量较少且并行度较小,即使使用顺序 DES 也能快速模拟此类模型。

Heterogeneous parallel simulation. The scheduling algorithm of Unison assumes that each processor core has the same clock frequencies. For parallel simulation on processor cores with different clock frequencies, a more general scheduling strategy has to be considered. Furthermore, Unison only utilizes CPU cores. For GPU and FPGA-based devices, other approaches are required to utilize their potential computation power and parallelism.

异构并行模拟。 Unison 的调度算法假设每个处理器核心具有相同的时钟频率。对于在不同时钟频率的处理器核心上进行并行模拟,需要考虑更为通用的调度策略。此外,Unison 仅利用 CPU 核心。对于 GPU 和 FPGA 等基于设备的计算,还需要采用其他方法来利用它们的潜在计算能力和并行性。

Future work. We will apply Unison to other network simulators including OMNeT++ and ns.py. We are also going to explore heterogeneous parallel simulation and emulation by investigating the use of other computation components, such as FPGAs and programmable switches.

未来工作。 我们将把 Unison 应用于其他网络模拟器,包括 OMNeT++ 和 ns.py。我们还计划通过研究其他计算组件(如 FPGAs 和可编程交换机)的使用,探索异构并行模拟和仿真。

Network performance estimation. In addition to DES, flow-level mathematical modeling and end-to-end performance estimators [9, 24, 26, 36] can be used in network performance estimation as well. However, they treat the whole network as a black box, which cannot provide detailed visibility [40]. Data-driven approaches are currently replacing their roles. However, as already discussed in §2.2, existing data-driven approaches still have limited usability, long training time, approximated results and rely on DES to collect training data for new scenarios [40, 42].

Another recent work eliminates the need for training by transforming the original topology into many link-level topologies [43]. It then simulates these topologies using DES in parallel and aggregates the results. However, it still relies on DES and can only be used to estimate the tail latency.

网络性能估计。 除了离散事件模拟(DES),流级数学建模和端到端性能估计器 [9, 24, 26, 36] 也可以用于网络性能估计。然而,这些方法将整个网络视为一个黑盒,无法提供详细的可见性 [40]。数据驱动的方法目前正在取代它们的作用。然而,如在§2.2 中讨论的那样,现有的数据驱动方法仍然存在可用性有限、训练时间长、结果近似且依赖于 DES 来为新场景收集训练数据的问题 [40, 42]。

最近的一项工作通过将原始拓扑转化为多个链路级拓扑来消除训练的需求 [43]。然后,它并行模拟这些拓扑并聚合结果。然而,这种方法仍然依赖于 DES,并且只能用于估计尾部延迟。

Zero-configuration fast network PDES. We identified that both ns-3 and OMNeT++ communities have attempted to achieve zero-configuration network PDES by using a sharedmemory approach. The ns-3 community has attempted a multithreaded approach [37]. However, they primarily focus on thread safety issues and their costs, ignoring cache effects, scheduling strategy, determinism and scalability. In contrast, our work provides an extensive enhancement and a greater speedup upon this.

The proposal of OMNeT++ [5] is to identify concurrently processable events in the FEL via a colorization algorithm running on a worker thread, allowing other threads to grab these processable events. However, their proposal relies on the distance matrix of every LP, which would occupy a significant amount of memory (𝑂(𝑛 2 ) if the number of LPs is 𝑛) for large models, and it has not been implemented yet.

零配置快速网络 PDES。 我们发现 ns-3 和 OMNeT++ 社区都曾尝试通过使用共享内存的方法来实现零配置网络 PDES。ns-3 社区尝试了一种多线程方法 [37]。然而,他们主要关注线程安全问题及其成本,忽略了缓存效应、调度策略、确定性和可扩展性。相比之下,我们的工作在这些方面提供了广泛的增强和更大的加速。

OMNeT++ [5] 提出的方案是通过在工作线程上运行的着色算法来识别在未来事件列表(FEL)中可并行处理的事件,允许其他线程抓取这些可处理的事件。然而,他们的方案依赖于每个逻辑进程(LP)的距离矩阵,对于大型模型来说,这将占用大量内存(如果 LP 的数量为 n,则为 O(n²)),而且该方案尚未实现。

Another recent work uses a data-oriented design to reduce cache misses in network simulation [11]. They adopt existing PDES algorithms to cope with large-scale simulations, but the profiling and optimization of PDES are not in their design scope. Moreover, it requires a full re-architecture of existing network simulators to cope with the data-oriented paradigm, which means the entire network protocol stack and applications have to be redesigned to use their simulator. In contrast, our work retains compatibility with existing ns-3 frameworks, requires zero configurations, and can be easily adapted to other network simulators based on DES.

最近的另一项工作采用面向数据的设计来减少网络模拟中的缓存未命中 [11]。他们采用了现有的 PDES 算法来应对大规模模拟,但 PDES 的分析和优化不在其设计范围内。此外,这需要对现有的网络模拟器进行完全的重新架构,以应对面向数据的范式,这意味着整个网络协议栈和应用程序都必须重新设计才能使用他们的模拟器。相比之下,我们的工作保留了与现有 ns-3 框架的兼容性,零配置即可实现,并且可以轻松适应基于 DES 的其他网络模拟器。

Conclusion

Existing PDES algorithms for network simulation are not widely used in practice due to their complex configuration and limited performance gains. This paper introduces a new network simulation kernel, Unison, which is parallel-effect and user-transparent. Unison addresses these limitations by adapting fine-grained partition and load-adaptive scheduling. Our evaluations demonstrate that Unison is fast, transparent, accurate and deterministic across various scenarios.

现有的用于网络模拟的离散事件并行仿真(PDES)算法由于其复杂的配置和有限的性能提升,在实际中并未得到广泛应用。本文介绍了一种新的网络模拟内核——Unison,它具备并行效应且对用户透明。Unison 通过采用细粒度划分和负载自适应调度,克服了这些限制。我们的评估表明,Unison 在各种场景下均表现出快速、透明、准确和确定性的特点。

Acknowledgments

We would like to thank our shepherd, Qizhen Zhang, and the anonymous EuroSys ’24 reviewers for all their constructive feedback and comments. This research is supported by the National Key R&D Program of China under Grant Numbers 2022YFB2901502, and the National Natural Science Foundation of China under Grant Numbers 62072228, 62172204 and 62325205.