MimicNet: Fast Performance Estimates for Data Center Networks with Machine Learning¶

TLDR¶

先让 gemini 读一遍. 发现其实不像前两篇 SimBricks 和 Phantom 那么与我强相关

(1) 核心问题与背景

传统仿真的局限性:
- 在 large-scale dcn 中, Packet-level 仿真虽精度高, 但速度极慢! 模拟 5 分钟的流量可能需要数天乃至数月
现有方案的权衡:
1. 硬件测试床: 成本昂贵
2. Flow-level 仿真: 虽然快, 但无法捕捉丢包, 时延抖动等数据包级效应, 导致精度损失严重

(2) MimicNet 核心设计方案

MimicNet 的核心思想是: "局部精确仿真 + 全局模型近似"

观察集群 (Observable Cluster):
- 在整个数据中心中, 仅对一个选定的集群 进行完整的数据包级仿真
拟合模型 (Mimics):
- 将其余集群替换为基于机器学习训练出的"拟合模型"
- 这些模型, 负责预测数据包经过非观察集群时的网络效应, 如: 丢包, 时延, ECN ... 而无需模拟其内部复杂的交换机排队过程
双重模型结构:
1. 内部模型(Internal Models): 基于 LSTM 架构, 学习集群内部的机械行为(排队, 路由等)及背景流量效应
2. 补给模型(Feeder Models): 基于 flow-level 近似, 估算非观察集群间的流量到达率, 以更新内部模型的隐藏状态

(3) 关键技术创新

可扩展特征选择: 仅选择与网络规模无关的"可扩展特征"(如本地机架索引, 数据包大小等), 确保在 2 集群环境下训练的模型能直接应用于 128 集群的场景
领域定制损失函数:
- 加权二进制交叉熵 (WBCE): 解决丢包等稀疏事件的类别不平衡问题
- Huber Loss: 平衡时延预测中的平均误差与尾部异常值
自动超参数调优: 使用 贝叶斯优化 和 Wasserstein 距离 作为指标, 优化端到端的预测准确性

(4) 实验结果与结论

论文通过在 CloudLab 上对 TCP, DCTCP, Homa 等多种协议进行实验, 得出以下结论:

极高的速度提升:
- 在 128 个集群 (1024 台主机) 的规模下, 比完整仿真快 675 倍
- 模拟原本需要 12 天的任务, MimicNet 仅需不到 30 分钟
高精度保留:
- 对 FCT (流完成时间), 吞吐量和 RTT 的尾部估计误差均在 5% 以内
优于流级近似:
- 不仅精度远超流级仿真 (SimGrid), 在大规模网络下由于减少了连接维护开销, 其运行速度甚至也快于流级仿真

(5) 限制与假设

拓扑限制: 目前主要针对经典的 Fat-Tree 拓扑
流量特性: 假设流量模式按比例缩放, 且拥塞主要发生在 Fan-in 处

fan-in 是什么

alt text

fan-in 就是指: "多打一"时的, 接收端 uplink-port 的位置

Introduction¶

Over the years, many novel protocols and systems have been proposed to improve the performance of data center networks [57, 12, 19, 33, 39]. Though innovative in their approaches and promising in their results, these proposals suffer from a consistent challenge: the difficulty of evaluating systems at scale. Networks, highly interconnected and filled with dependencies, are particularly challenging in that regard—small changes in one part of the network can result in large performance effects in others.

Unfortunately, full-sized testbeds that could capture these effects are prohibitively expensive to build and maintain. Instead, most preproduction performance evaluation comprises orders of magnitude fewer devices and fundamentally different network structures. This is true for (1) hardware testbeds [47], which provide total control of the system, but at a very high cost; (2) emulated testbeds [43, 54, 56], which model the network but at the cost of scale or network effects; and (3) small regions of the production network, which provide ‘in vivo’ accuracy but force operators to make a trade-off between scale and safety [48, 59]. The end result is that, often, the only way to ascertain the true performance of the system at scale is to deploy it to the production network.

We note that simulation was originally intended to fill this gap. In principle, simulations provide an approximation of network behavior for arbitrary architectures at an arbitrary scale. In practice, however, modern simulators struggle to provide both simultaneously. As we show in this paper, even for relatively small networks, packet-level simulation is 3–4 orders of magnitude slower than realtime (5 min of simulated time every ∼3.2 days); larger networks can easily take months or longer to simulate. Instead, researchers often either settle for modestly sized simulations and assume that performance translates to larger deployments, or they fall back to approaches that ignore packet-level effects like flow approximation techniques. Both sacrifice substantial accuracy.

In this paper, we describe MimicNet, a tool for fast performance estimation of at-scale data center networks. MimicNet presents to users the abstraction of a packet-level simulator; however, unlike existing simulators, MimicNet only simulates—at a packet levelthe traffic to and from a single ‘observable’ cluster, regardless of the actual size of the data center. Users can then instrument the host and network of the designated cluster to collect arbitrary statistics. For the remaining clusters and traffic that are not directly observable, MimicNet approximates their effects with the help of deep learning models and flow approximation techniques.

As a preview of MimicNet’s evaluation results, Figure 1 shows the accuracy of its Flow-Completion Time (FCT) predictions for various data center sizes and compares it against two common alternatives: (1) flow-level simulation and (2) running a smaller simulation and assuming that the results are identical for larger deployments. For each approach, we collected the simulated FCTs of all flows with at least one endpoint in the observable cluster. We compared the distribution of each approach’s FCTs to that of a full-fidelity packetlevel simulation using a𝑊 1 metric. The topology and traffic pattern were kept consistent, except in the case of small-scale simulation where that was not possible (instead, we fixed the average load and packet/flow size). While MimicNet is not and will never be a perfect portrayal of the original simulation, it is 4.1× more accurate than the other methods across network sizes, all while improving the time to results by up to two orders of magnitude.

To achieve these results, MimicNet imposes a few carefully chosen restrictions on the system being modeled: that the data center is built on a classic FatTree topology, that per-host network demand is predictable a priori, that congestion occurs primarily on fan-in, and that a given host’s connections are independently managed. These assumptions provide outsized benefits to simulator performance and the scalability of its estimation accuracy, while still permitting application to a broad class of data center networking proposals, both at the end host and in the network.

Concretely, MimicNet operates as follows. First, it runs a simulation of a small subset of the larger data center network. Using the generated data, it trains a Mimic—an approximation of clusters’ ‘non-observable’ internal and cross-cluster behavior. Then, to predict the performance of an 𝑁 cluster simulation, it carefully composes a single observable cluster with 𝑁 − 1 Mimic’ed clusters to form a packet-level generative model of a full-scale data center. Assisting with the automation of this training process is a hyperparameter tuning stage that utilizes arbitrary user-defined metrics (e.g., FCT, RTT, or average throughput) and MimicNet-defined metrics (e.g., scale generalizability) rather than traditional metrics like L1/2 loss, which are a poor fit for a purely generative model.

This entire process—small-scale simulation, model training/tuning, and full-scale approximation—can be orders of magnitude faster than running the full-scale simulation directly, with only a modest loss of accuracy. For example, in a network of a thousand hosts, MimicNet’s steps take 1h3m, 7h10m, and 25m, respectively, while full simulation takes over a week for the same network/workload. These results hold across a wide range of network configurations and conditions extracted from the literature. This paper contributes:

Techniques for the modeling of cluster behavior using deep-learning techniques and flow-level approximation. Critical to the design of the Mimic models are techniques to ensure the scalability of their accuracy, i.e., their ability to generalize to larger networks in a zero-shot fashion.
An architecture for composing Mimics into a generative model of a full-scale data center network. For a set of complex protocols and real-world traffic patterns, MimicNet can match groundtruth results orders of magnitude more quickly than otherwise possible. For large networks, MimicNet even outperforms flowlevel simulation in terms of speed (in addition to producing much more accurate results).
A customizable hyperparameter tuning procedure and loss function design that ensure optimality in both generalization and a set of arbitrary user-defined objectives.
Implementations and case studies of a wide variety of network protocols that stress MimicNet in different ways.

The framework is available at: https://github.com/eniac/MimicNet.

(1) 现有大规模网络评估的困境

评估挑战: 尽管不断有新的数据中心网络协议被提出, 但在大规模环境下评估它们非常困难, 因为网络高度互联, 微小的变化可能在其他部分产生巨大的性能影响
现有方法的局限:
- 全尺寸测试床极其昂贵;
- 硬件测试床、仿真测试床或小规模生产网络测试都存在成本高、规模受限或安全风险等问题
仿真的瓶颈:
- 尽管仿真本应解决这一问题, 但现代数据包级仿真在处理大规模网络时速度极慢, 比实时慢 3-4 个数量级 (模拟 5 分钟可能需要约 3.2 天)
- 这就导致研究人员往往牺牲准确性采用小规模仿真或 flow-level 近似

(2) MimicNet 的核心设计与优势

系统概述: MimicNet 是一个用于快速估算大规模数据中心网络性能的工具
工作原理: 它为用户提供数据包级仿真的抽象, 但实际上仅对单个"可观察"集群(Observable Cluster)进行完整的数据包级仿真
近似技术: 对于其余不可直接观察的集群和流量, MimicNet 利用深度学习模型和流近似技术来模拟其网络效应
性能提升: 相比其他方法(如流级仿真或小规模推演), MimicNet 的准确性提高了 4.1 倍, 同时速度提升了两个数量级

(3) 关键假设与限制

为了实现高效能, MimicNet 设定了一些特定限制:
- 数据中心基于经典的 FatTree 拓扑
- 主机网络需求可预测
- 拥塞主要发生在扇入(Fan-in)处
- 主机连接是独立管理的
这些假设在保证仿真器性能和可扩展性的同时, 仍能适用于广泛的数据中心网络提案

(4) 自动化工作流

数据生成: 首先运行小规模网络子集的仿真以生成数据
模型训练: 利用数据训练"Mimic"模型, 以近似集群内部和跨集群的行为
全规模组合: 将一个可观察集群与 N-1 个 Mimic 集群组合, 构建全规模数据中心的生成模型
超参数调优: 使用自定义指标(如 FCT, RTT)和可扩展性指标进行自动调优, 而非使用传统的损失函数

(5) 主要贡献

提出了利用深度学习和流级近似对集群行为进行建模的技术, 确保准确性可扩展至更大网络(零样本泛化)
设计了一种将 Mimic 模型组合成全规模数据中心生成模型的架构, 在大型网络中速度甚至超过流级仿真
开发了可自定义的超参数调优程序和损失函数设计
提供了多种网络协议的实现和案例研究

Motivation¶

Modern data center networks connect up to hundreds of thousands of machines that, in aggregate, are capable of processing hundreds of billions of packets per second. They achieve this via scale-out network architectures, and in particular, FatTree networks like the one in Figure 3 [4, 18, 50]. In the canonical version, the network consists of Top-of-Rack (ToR), Cluster, and Core switches. We refer to the components under a single ToR as a rack and the components under and including a group of Cluster switches as a cluster. A large data center might have over 100 such clusters.

The size and complexity of these networks make testing and evaluating new ideas and architectures challenging. Researchers have explored many potential directions including verification [15, 26, 27, 35, 57], emulation [52, 54, 56], phased rollouts [48, 59], and runtime monitoring [20, 58]. In reality, all of these approaches have their place in a deployment workflow; however, in this paper, we focus on a critical early step: pre-deployment performance estimation using simulation.

现代数据中心网络连接着多达数十万台机器, 这些机器总体上(in aggregate)每秒能够处理数千亿个数据包. 它们通过横向扩展(scale-out)的网络架构来实现这一能力, 特别是如图 3 所示的胖树(FatTree)网络:

alt text

在典型版本中, 网络由:

ToR 交换机
Cluster 交换机
Core 交换机

组成. 我们将单个 ToR 下的组件称为一个机架(rack), 将一组集群交换机及其下属组件统称为一个集群(cluster). 一个大型数据中心可能拥有超过 100 个这样的集群

这些网络的规模和复杂性使得测试和评估新想法及架构变得充满挑战. 研究人员已经探索了许多潜在方向, 包括验证, 模拟(emulation), 分阶段部署和运行时监控

实际上, 所有这些方法在部署工作流中都占有一席之地

然而, 在本文中, 我们关注一个关键的早期步骤: 使用 simulation 进行部署前的性能预估

2.1 Background on Network Simulation¶

The most popular simulation frameworks include OMNeT++ [34], ns-3 [42], and OPNET [1]. Each of these operates at a packet-level and are built around an event-driven model [53] in which the operations of every component of the network are distilled into a sequence of events that each fire at a designated ‘simulated time.’

Compared to evaluation techniques such as testbeds and emulation, these simulators provide a number of important advantages:

Arbitrary scale: Decoupling the system model from both hardware and timing constraints means that, in principle, simulations can encompass any number of devices.
Arbitrary extensions: Similarly, with full control over the simulated behavior, users can model any protocol, topology, design, or configuration.
Arbitrary instrumentation: Finally, simulation allows the collection of arbitrary information at arbitrary granularity without impacting system behavior.

In return for the above benefits, simulators trade-off varying levels of accuracy compared to a bare-metal deployment. Even so, prior work has demonstrated their value in approximating real behavior [5, 6, 33, 46, 55].

最流行的simulation框架包括 OMNeT++, ns-3 和 OPNET.

每一个框架都在数据包级别(packet-level)运行, 并建立在事件驱动模型之上; 在该模型中, 网络每个组件的操作被抽象为一系列事件, 每个事件在指定的"仿真时间"触发

与测试床和 emulation 等评估技术相比, 这些仿真器提供了许多重要的优势:

任意规模(Arbitrary scale): 将系统模型与硬件和时序约束解耦意味着, 原则上, 仿真可以包含任意数量的设备
任意扩展(Arbitrary extensions): 同样, 通过完全控制仿真行为, 用户可以对任何协议, 拓扑, 设计或配置进行建模
任意监测(Arbitrary instrumentation): 最后, 仿真允许在不影响系统行为的情况下, 以任意粒度收集任意信息

为了换取上述优势, 与裸机部署相比, 仿真器在精度上做出了不同程度的权衡. 即便如此, 先前的工作已经证明了它们在近似真实行为方面的价值

2.2 Scalability of Today’s Simulators¶

While packet-level simulation is easy to reason about and extend, simulating large and complex networks is often prohibitively slow. One reason for this is that discrete-event simulators, in essence, take a massive distributed system and serialize it into a single event queue. Thus, the larger the network, the worse the simulation performs in comparison.

Parallelization. A natural approach to improving simulation speed is parallelization, for instance, with the parallel DES (PDES) technique [17]. In PDES, the simulated network is partitioned into multiple logical processes (LPs), where each process has its own event queue that is executed in parallel. Eventually, of course, the LPs must communicate. In particular, consistency demands that a process cannot finish executing events at simulated time 𝑡 unless it can be sure that no other process will send it additional events with 𝑡 𝑒 < 𝑡. In these cases, synchronization may be necessary.

Parallel execution is therefore only efficient when the LPs can run many events before synchronization is required, which is typically not the case for highly interconnected data center networks. In fact, simulation performance often decreases in response to parallelization (see Figure 2). Many frameworks instead recommend running several instances with different configurations [14]. This trivially provides a proportional speedup to aggregate simulation throughput but does not improve the time to results.

Approximation. The other common approach is to leverage various forms of approximation. For example, flow-level approaches [38] average the behavior of many packets to reduce computation. Closedform solutions [37] and a vast array of optimized custom simulators [33, 45, 46] also fall in this category. While these approaches often produce good performance; they require deep expertise to craft and limit the metrics that one can draw from the analysis.

虽然数据包级仿真易于推理和扩展, 但模拟大型复杂网络通常极其缓慢. 原因之一是: 离散事件仿真器本质上将一个大规模分布式系统序列化为一个单一的事件队列

因此, 网络规模越大, 仿真的相对性能就越差

有两种非常常见的解决策略:

(1) 并行化(Parallelization)

提高仿真速度的一个自然方法是并行化, 例如使用并行离散事件仿真(PDES)技术

在 PDES 中, 被仿真的网络被划分为多个逻辑进程(LP), 每个进程都有自己的事件队列并并行执行

当然, LP 最终必须进行通信. 特别是, 一致性要求一个进程不能在仿真时间 \(t\) 完成事件执行, 除非它能确定没有其他进程会向其发送时间 \(t_e\) < \(t\) 的额外事件. 在这些情况下, 同步可能是必要的

因此, 只有当 LP 在需要同步之前能够运行许多事件时, 并行执行才是高效的, 而对于高度互联的数据中心网络来说, 情况通常并非如此.

alt text

事实上, 仿真性能往往会因并行化而下降(见图 2). 许多框架反而建议运行具有不同配置的多个实例. 这虽然能简单地按比例提高总的仿真吞吐量, 但并不能缩短获得结果的时间

(2) 近似(Approximation)

另一种常见的方法是利用各种形式的近似

例如:

flow-level 方法通过对许多数据包的行为进行平均来减少计算量
Closed-form solutions 和大量优化的自定义仿真器也属于这一类

虽然这些方法通常能产生良好的性能, 但它们需要深厚的专业知识来构建(2), 并且限制了人们可以从分析中提取的指标范围(1)

Design Goals¶

MimicNet is based around the following design goals:

Arbitrary scale, extensions, and instrumentation: Acknowledging the utility of packet-level simulation in enabling flexible and rich evaluations of arbitrary network designs, we seek to provide users with similar flexibility with MimicNet.
Orders of magnitude faster results: Equally important, MimicNet must be able to provide meaningful performance estimates several orders of magnitude faster than existing approaches. Parallelism, on its own, is not enough—we seek to decrease the total amount of work.
Tunable and high accuracy: Despite the focus on speed, MimicNet should produce observations that resemble those of a full packet-level simulation. Further, users should be able to define their own accuracy metrics and to trade this accuracy off with improved time to results.

Explicitly not a goal of our framework is full generality to arbitrary data center topologies, routing strategies, and traffic patterns. Instead, MimicNet makes several carefully chosen and domainspecific assumptions (described in Section 4.2) that enable it to scale to larger network sizes than feasible in traditional packet-level simulation. We argue that, in spite of these restrictions, MimicNet can provide useful insights into the performance of large data centers.

MimicNet 的设计基于以下目标:

任意规模, 扩展与监测(Arbitrary scale, extensions, and instrumentation):
- 鉴于 packet-level 级仿真, 在对任意网络设计进行灵活且丰富的评估方面, 所具有的效用. 我们致力于通过 MimicNet 为用户提供同等的灵活性
提升数个数量级的结果获取速度(Orders of magnitude faster results):
- 同样重要的是, MimicNet 必须能够比现有方法快数个数量级地提供有意义的性能估算
- 仅依靠并行化是不够的: 我们的目标是减少总的计算工作量
可调优的高准确性:
- 尽管侧重于速度, MimicNet 产生的观测结果仍应与完整的 packet-level 仿真相近
- 此外, 用户应能够定义自定义的准确性指标, 并在准确性与结果获取时间之间进行权衡

本框架明确: 不追求针对任意数据中心拓扑, 路由策略和流量模式的完全通用性

相反, MimicNet 做出了若干经过深思熟虑的特定领域假设(详见第 4.2 节), 这使其能够扩展到传统数据包级仿真无法企及的网络规模. 我们认为, 尽管存在这些限制, MimicNet 仍能为大型数据中心的性能提供有价值的见解

Overview¶

MimicNet’s approach is as follows. Every MimicNet simulation contains a single ‘observable’ cluster, regardless of the total number of clusters in the data center. All of the hosts, switches, links, and applications in this cluster as well as all of the remote applications with which it communicates are simulated in full fidelity. All other behavior—the traffic between un-observed clusters, their internals, and anything else not directly observed by the user—is approximated by trained models.

While prior work has also attempted to model systems and networks (e.g., [54, 56]), these prior systems tend to follow a more traditional script by (1) observing the entire system/network and (2) fitting a model to the observations. MimicNet is differentiated by the insight that, by carefully composing models of small pieces of a data center, we can accurately approximate the full data center network using only observations of small subsets of the network.

MimicNet 的方法如下：无论数据中心内的集群总数如何，每一次 MimicNet 仿真都包含且仅包含一个“可观察”（observable）集群。该集群内的所有主机、交换机、链路和应用程序，以及与之通信的所有远程应用程序，均进行全保真（full fidelity）仿真。所有其他行为——即未被观察集群之间的流量、其内部机制以及用户未直接观察到的任何其他内容——均通过训练好的模型进行近似。

虽然先前的工作也尝试过对系统和网络进行建模，但这往往遵循更传统的套路：

观察整个系统/网络
将模型拟合到观测数据上

MimicNet 的不同之处在于其基于这样一个洞见: 通过精心组合数据中心小片段的模型，我们仅需观察网络的微小子集，即可准确近似完整的数据中心网络

4.1 MimicNet Design¶

MimicNet constructs and composes models at the granularity of individual data center clusters: Mimics. From the outside, Mimics resemble regular clusters. Their hosts initiate connections and exchange data with the outside world, and their networks drop, delay, and modify that traffic according to the internal queues and logic of the cluster’s switches. However, Mimics differ in that they are able to predict the effects of that queuing and protocol manipulation without simulating or interacting with other Mimics—only with the observable cluster.

MimicNet 以单个数据中心集群为粒度构建和组合模型，我们称之为“Mimics”。从外部看，Mimics 类似于常规集群。它们的主机发起连接并与外部世界交换数据，其网络也会根据集群交换机的内部队列和逻辑对流量进行丢包、延迟和修改。然而，Mimics 的不同之处在于，它们能够预测这些排队和协议处理的效果，而无需模拟其他 Mimics 或与它们交互——仅需与可观察集群交互。

We note that the goal of MimicNet is not to replicate the effects of any particular large-scale simulation, just to generate results that exhibit their characteristics. It accomplishes the above with the help of two types of models contained within each Mimic: (1) deeplearning-based internal models that learn the behavior of switches, links, queues, and intra-cluster cross-traffic; and (2) flow-based feeder models that approximate the behavior of inter-cluster crosstraffic. The latter is parameterized by the size of the data center. Together, these models take a sequence of observable packets and their arrival times and output the cluster’s predicted effects:

(1) Whether the packets are dropped as a result of the queue management policy.

(2) When the packets egress the Mimic, given no drop.

(3) Where the packets egress, based on the routing table.

(4) The contents of the packets after traversing the Mimic, including modifications such as TTL and ECN.

需要注意的是，MimicNet 的目标并非复制任何特定大规模仿真的效果，而是生成具有其特征的结果。

它通过每个 Mimic 内部包含的两种模型来实现上述目标：

基于深度学习的内部模型，用于学习交换机、链路、队列和集群内交叉流量（intra-cluster cross-traffic）的行为
基于流的馈送模型，用于近似集群间交叉流量（inter-cluster cross-traffic）的行为

后者根据数据中心的规模进行参数化。这些模型共同接收一系列可观察的数据包及其到达时间，并输出集群的预测效果：

(1) 是否因队列管理策略导致数据包被丢弃

(2) 若未丢包，数据包何时从 Mimic egress

(3) 数据包从何处 egress（基于路由表）

(4) 穿越 Mimic 后的数据包内容，包括 TTL 和 ECN 等修改

Workflow. The usage of MimicNet (depicted in Figure 3) begins with a small subset of the full simulation: just two user-defined clusters communicating with one another. This full-fidelity, small-scale simulation is used to generate training and testing sets for supervised learning of the models described above. Augmenting this training phase is a configurable hyper-parameter tuning stage in which MimicNet explores various options for modeling with the goal of maximizing both (a) user-defined, end-to-end accuracy metrics like throughput and FCT, and (b) generalizability to larger configurations and different traffic matrices.

Using the trained models, MimicNet assembles a full-scale simulation in which all of the clusters in the network (save one) are replaced with Mimics. For both data generation and large-scale simulation, MimicNet uses OMNeT++ as a simulation substrate.

alt text

工作流:

MimicNet 的使用流程（如图 3 所示）

始于全规模仿真的一个小规模子集：仅包含两个用户定义的相互通信的集群
此"全保真、小规模仿真"用于生成上述模型监督学习所需的训练集和测试集
在此训练阶段之后，是一个可配置的超参数调优阶段，MimicNet 在此阶段探索各种建模选项，目标是最大化
- 用户定义的端到端准确性指标（如: thpt 和 FCT）
- 对更大规模配置和不同流量矩阵的泛化能力
利用训练好的模型，MimicNet 组装成一个全规模仿真，其中网络中除一个集群外的所有集群均被替换为 Mimics

在数据生成和大规模仿真中，MimicNet 均使用 OMNeT++ 作为仿真基底

Performance analysis. To understand MimicNet’s performance gains, consider the Mimic in Figure 4 and the types of packets that flow through it. At a high level, there are two such types: (1) traffic that interacts with the observable cluster (Mimic-Real), and (2) traffic that does not (Mimic-Mimic).

性能分析:

为了理解 MimicNet 的性能提升，请考虑图 4 中的 Mimic 以及流经它的数据包类型。从高层次来看，主要有两类流量:

与可观察集群交互的流量（Mimic-Real）
不与可观察集群交互的流量（Mimic-Mimic）

alt text

As a back-of-the-envelope computation, assume that we simulate 𝑁 clusters, 𝑁 ≫ 2. Also assume that𝑇 is the total number of packets sent in the full simulation of the data center and that \(p\) is the ratio of traffic that leaves a cluster vs. that stays within it (inter-cluster-to-intra-cluster), \(0 \le p \le 1\). The number of packets that leave a single cluster in the full simulation is then approximately \(\frac{Tp}{N}\).

Because Mimics only communicate with the single observable cluster and not each other, the number of packets that leave a Mimic in an approximate simulation is instead:

\[ \frac{Tp}{N(N - 1)} \]

Thus, the total number of packets generated in a MimicNet simulation (the combination of all traffic generated at the observable cluster and \(N - 1\) Mimics) is:

\[ \frac{T}{N} + \frac{(N - 1)Tp}{N(N - 1)} = \frac{T + Tp}{N} \]

The total decrease in packets generated is, therefore, a factor between \(\frac{N}{2}\) and \(N\) with a bias toward \(N\) when traffic exhibits cluster-level locality. Fewer packets and connections generated mean less processing time and a smaller memory footprint. It also means a decrease in inter-cluster communication, which makes the composed simulation more amenable to parallelism than the full version.

当流量表现出集群级局部性时, 该因子偏向于 N

生成更少的数据包和连接, 意味着更少的处理时间和更小的内存占用

这同时也意味着集群间通信的减少, 使得组合后的仿真比完整版本更易于并行化

4.2 Restrictions¶

MimicNet makes several domain-specific assumptions that aid in the scalability and accuracy of the MimicNet approach.

Failure-free FatTrees: MimicNet assumes a FatTree topology, where the structure of the network is recursively defined and packets follow a strict up-down routing. This allows it to assume symmetric bisection bandwidth and to break cluster-level modeling into simpler subtasks.
Traffic patterns that scale proportionally: To ensure that models trained from two clusters scale up, MimicNet requires a per-host synthetic model of flow arrival, flow size, packet size, and cluster-level locality that is independent of the size of the network. In other words (at least at the host level), users should ensure that the size and frequency of packets in the first step resemble those of the last step. We note that popular datasets used in recent literature already adhere to this [6, 8, 33, 40].
Fan-in bottlenecks: Following prior work, MimicNet assumes that the majority of congestion occurs on fan-in toward the destination [24, 50]. This allows us to focus accuracy efforts on only the most likely bottlenecks.
Intra-host isolation: To enable the complete removal of MimicMimic connections at end hosts, MimicNet requires that connections be logically isolated from one another inside the hostMimicNet models network effects but does not model CPU interactions or out-of-band cooperation between connections.

MimicNet, as a first step toward large-scale network prediction is, thus, not suited for evaluating every data center architecture or configuration. Still, we argue that MimicNet can provide useful performance estimates of a broad class of proposals. We also discuss potential relaxations to the above restrictions in Appendix A, but leave those for future work.

MimicNet 做出了一些特定领域的假设，这些假设有助于提高 MimicNet 方法的可扩展性和准确性:

Failure-free FatTrees:
- MimicNet 假设采用胖树（FatTree）拓扑，其中网络结构是递归定义的，数据包遵循严格的“上-下”路由
- 使其能够: 假设对称的对分带宽（bisection bandwidth），并将集群级建模分解为更简单的子任务
Traffic patterns that scale proportionally:
- 为确保从两个集群训练的模型能够扩展，MimicNet 要求每台主机的流到达、流大小、数据包大小和集群级局部性的合成模型独立于网络规模
- 换句话说（至少在主机层面），用户应确保第一步中的数据包大小和频率与最后一步相似
Fan-in bottlenecks:
- 遵循先前的工作，MimicNet 假设大部分拥塞发生在通往目的地的扇入（fan-in）过程中
- 这使我们能够将提高准确性的精力集中在 "最可能的瓶颈" 上
主机内隔离(Intra-host isolation):
- 为了在终端主机处完全移除 Mimic-Mimic 连接，MimicNet 要求连接在主机内部逻辑上相互隔离: MimicNet 对网络效应建模，但不模拟 CPU 交互或连接间的带外协作

因此，作为大规模网络预测的第一步，MimicNet 并不适合评估每一种数据中心架构或配置。尽管如此，我们认为 MimicNet 仍能为一大类提案提供有用的性能估算。我们还在附录 A 中讨论了对上述限制的潜在放宽，但将其留待未来工作

类别 (Category)	代表性工作/方法 (Representative Work)	主要特点 (Key Features)	局限性/与本文改进 (Limitations/Improvements)
数据包级仿真	ns-3 [21, 42], OMNeT++ [34], Mininet [29], BigHouse [37]	存在数十年，网络研究的关键工具；BigHouse 利用经验分布模型。	在大型网络中需牺牲可扩展性或粒度 MimicNet 提供更逼真的模拟，不完全依赖流量分布模型
仿真器 (Emulators)	Flexplane [43], Pantheon [56], DIABLO [52]	围绕真实组件构建以保持真实感；DIABLO 利用 FPGA 降低成本。	依赖真实组件限制了规模即使利用 FPGA，大规模复制成本依然极高（~100万美元）
分阶段部署	A/B Testing [49, 59]	在生产网络切片上进行测试，展示真实规模性能。	对大多数研究人员而言不可行 (Infeasible)
本文初步版本	[25]	探索利用深度学习近似数据包级仿真的可行性。	本文改进：引入规模无关特征、馈送模型、端到端超参数调优、改进损失函数及更深入的评估

Conclusion and Future Work¶

This paper presents a system, MimicNet, that enables fast performance estimates of large data center networks. Through judicious use of machine learning and other modeling techniques, MimicNet exhibits super-linear scaling compared to full simulation while retaining high accuracy in replicating observable traffic. While we acknowledge that there is still work to be done in making the process simpler and even more accurate, the design presented here provides a proof of concept for the use of machine learning and problem decomposition for the approximation of large networks.

As part of the future work, we would like to further improve MimicNet’s speed with the support of incremental model updates when models need retraining; and its accuracy with models that involve more network events at higher levels such as flow dependencies (details are in Appendix H). More generally, extending its accuracy and speed for the evaluation of more data center protocols and architectures is how MimicNet evolves in the future.

This work does not raise any ethical issues.

MimicNet: Fast Performance Estimates for Data Center Networks with Machine Learning¶

TLDR¶

Introduction¶

Motivation¶

2.1 Background on Network Simulation¶

2.2 Scalability of Today’s Simulators¶

Design Goals¶

Overview¶

4.1 MimicNet Design¶

4.2 Restrictions¶

Related Work¶

Conclusion and Future Work¶