Framework Evaluation¶

In this section, we evaluate STARRY NET by exploring two important aspects related to the framework. Q(i): Can S TAR RY N ET flexibly scale to various experimental requirements, with acceptable system and configuration overhead? Q(ii): How faithful are the results obtained by STARRY NET, as compared with other state-of-the-art simulators, and live network performance? Our framework evaluations are conducted on a typical enterprise cluster, including eight DELL R740 servers connected to a LAN. Each server is equipped with two Intel Xeon 5222 Processors (4-core, 3.8GHz for each processor), 8*32G DDR4 RAM, and Ubuntu20.04-LTS.

在本节中，我们通过探索与框架相关的两个重要方面来评估STARRY NET。

Q(i)：STARRY NET是否能够灵活地适应各种实验需求，并且具有可接受的系统和配置开销？

Q(ii)：与其他最先进的仿真器和实际网络性能相比，STARRY NET所得结果的忠实度如何？

我们的框架评估是在一个典型的企业集群上进行的，该集群包括八台DELL R740服务器，这些服务器通过局域网相连接。每台服务器配备了两颗Intel Xeon 5222处理器（每颗处理器4核，主频3.8GHz）、8个32GB DDR4内存条，并运行Ubuntu 20.04-LTS。

6.1 Ability to Satisfy Various Experimental Requirements for ISTNs¶

alt text

Elastic scaling to various constellation configurations. In reality, satellite operators incrementally deploy their satellite mega-constellations, which consist of multiple shells. As depicted in Table 2, STARRY NET is able to flexibly create a userdefined experiment environment for different shells, or multishell combinations of representative mega-constellations to satisfy various research requirements. The emulated constellation size can scale from about 300 (e.g., the T1 shell of Telesat) to 4408 (e.g., the full-scale Starlink Phase I with five shells) following different users’ configurations.

星座配置弹性扩展能力。现实场景中卫星运营商采用分阶段部署多壳层的巨型星座，如表2所示，STARRY NET可灵活创建用户自定义的单一壳层或多壳层组合实验环境，支持从约300颗卫星（如Telesat T1壳层）到4408颗卫星（如完整版Starlink Phase I五壳层系统）的不同规模星座仿真。

Environment setup overhead. STARRY NET’s APIs have concealed complex underlying processing for trajectory calculation and resource orchestration for the emulation. Thus a researcher can easily establish each ENE listed in Table 2, by writing about a dozen lines of code based on constellation prefabs (e.g., like Figure 4b) predefined in STARRY NET’s database. The creation time of a certain ENE upon S TAR RY N ET tightly depends on the experiment scale, and the hardware capability of these machines used for experiments. Concretely, as shown in Table 2, the total creation time, including both node and link creations, increases as the constellation size scales up, and ranges from several minutes (for small size ENE) to tens of minutes (for large size ENE) in our current STARRY NET implementation.

实验环境构建开销。通过封装复杂的星历计算和资源编排流程，研究人员仅需基于预定义的星座模板（如图4b）编写十余行代码即可快速构建表2所列各类实验环境。

具体构建时间与实验规模及硬件性能相关：如表2所示，从节点创建到链路建立的整体耗时随星座规模增大而增加，在当前实现中，小型环境需数分钟，大型环境需数十分钟。

System overhead. Table 2 also plots the average CPU and memory overhead consumed on each machine by running various ENEs. We make several observations. First, as expected, when the constellation size increases, STARRY NET requires more worker machines, consuming more CPU/memory resources to emulate ISTN nodes, links, and their constellationwide dynamics. Second, if STARRY NET updates satellite dynamics more frequently (i.e., with shorter update intervals), it consumes more resources to accomplish fine-granularity updates. Note that in this experiment we limit the CPU usage below ∆ = 50% in each machine. This is because in an ENE, the runtime overhead of the underlying STARRY NET should not use up all CPU/memory resources. It is reasonable to leave sufficient resources for the tested workloads and functionalities running upon the ENE.

系统资源开销。表2同时展示了不同实验环境下单机平均CPU与内存开销，主要呈现以下特征：

(1) 星座规模扩大需要更多工作节点，导致资源消耗自然增加；

(2) 卫星状态更新频率提升（即缩短更新间隔）会加重系统负载。

本实验设置单机CPU使用率上限Δ=50%，以确保STARRY NET底层开销不会耗尽资源，为上层待测功能预留充足计算资源。该设计能有效平衡仿真精度与系统负载，满足多样化实验需求。

6.2 Fidelity Analysis¶

Next we analyze the fidelity of STARRY NET by comparing the experiment results obtained by STARRY NET with live satellite networks and other state-of-the-art simulators.

接下来我们通过将 STARRY NET 与实时卫星网络和其他最先进的模拟器的实验结果进行比较来分析 STARRY NET 的保真度。

Network performance under a live Starlink topology. We leverage STARRY NET to establish an ENE following the network topology of a recent live Starlink test conducted in Europe in 2021 [33]. Specifically, this real-world Starlink topology involves several key components as illustrated in Figure 5: (1) a user terminal together with a Starlink satellite dish located at the campus Klagenfurt Primoschgasse; (2) a SpaceX’s ground station located in Frankfurt, Germany; (3) a Point of Presence (PoP) connecting the ground station to terrestrial Internet; and (4) a Web server deployed in Vienna. This experiment publicly reports the ping and iperf results measured between user terminal and the Web server, over the ISTN integrating Starlink satellites and terrestrial Internet. We use STARRY NET, Hypatia [60] and StarPerf [61] to generate network performance under the same topology configuration. The latter two are state-of-the-art ISTN simulators. Figure 6 plots the comparison for the latency results. First, we find that existing simulators underestimate the latency, since their latency estimations are based on a high-level abstraction without considering system effects like packet processing overhead. Second, STARRY NET achieves acceptable fidelity, as it attains similar latency performance in each case (i.e., average/50th/70th/90th percentile) as compared with the real mea-sured data from live Starlink. This is because STARRY NET jointly combines model calculation, data-driven calibration and real networking stack to create the ENE.

alt text

真实星链拓扑下的网络性能验证。基于2021年欧洲实际星链测试拓扑，我们构建了包含以下要素的实验环境（如图5所示）：(1) 奥地利克拉根福大学Primoschgasse校区部署的用户终端与星链天线；(2) 德国法兰克福地面站；(3) 连接地面站与地面互联网的接入点(PoP)；(4) 维也纳Web服务器。通过对比真实星链网络与STARRY NET、Hypatia、StarPerf等主流模拟器的时延测试结果（如图6），发现：现有模拟器因采用高层抽象模型而低估时延，STARRY NET通过模型计算、数据驱动校准和真实协议栈的协同作用，其端到端时延（平均/50%/70%/90%分位数）与实际测量值高度吻合。

Bandwidth is a metric that can be affected by many operational factors. Therefore, in a research experiment S TAR RY N ET allows the researcher to manually configure the link capacity on demand. For example, we follow the realistic Starlink trace in [33] to set the uplink/downlink capacity, and run iPerf to measure the TCP throughput in each direction. Since Hypatia and StarPerf can not load real network traffic by iPerf, we compare the throughput results of live Starlink and STARRY NET. Specifically, evaluation results in Figure 7 demonstrate that STARRY NET can be tuned to accurately emulate the bandwidth of a live ISTN.

alt text

带宽可配置性验证。针对带宽易受运营因素影响的特点，STARRY NET支持按需手动配置链路容量。通过载入实际星链上下行容量参数进行iPerf测试，图7显示STARRY NET可精准复现真实网络吞吐量特性，而其他模拟器因无法加载真实流量存在局限性。

Network performance under an ISL-enabled topology. As of the date of this paper submission, most real mega-constellations like Starlink and Kuiper are still in their early stage and under heavy construction. Although Starlink has started to deploy laser ISLs on its LEO satellites, those ISLs are still under internal test, and it is difficult to directly compare the network performance estimated by STARRY NET with a real ISL-enabled satellite network. To analyze the fidelity of STARRY NET when ISLs are activated, we compare the performance results obtained by STARRY NET with other ISTN simulators. Figure 8 plots the CDF of latency between a collection of real ground-station pairs [26] with the same constellation configuration based on the ISL-enabled Starlink network. The latency results of STARRY NET are measured by ping test in the emulated ENE, while the results of other simulators are generated by numeric or event-driven calculation. As shown in Figure 8 the latency obtained by STARRY NET is slightly higher than other simulators, because STARRY NET incorporates realistic system-level overhead (e.g., packet processing) which could be neglected in simulators.

alt text

星间链路拓扑下的性能分析。鉴于当前星链/Kuiper等星座尚处建设初期，激光星间链路(ISL)仍处于内部测试阶段，我们通过对比模拟器数据评估ISL启用场景的仿真精度。如图8所示，基于26组地面站对的时延CDF曲线显示：STARRY NET时延略高于传统数值/事件驱动模拟器，因其包含数据包处理等系统级开销的精细化建模。

On-demand computation capability. Researchers may need to conduct their experiments on different satellite hardware with various computation capabilities. For example, authors in [53] studied the application performance achieved by two space-grade processors RAD-5545 [21] and HPSC [13]. Recent works like [23,44] explored new satellite functionalities running upon commercial low-power processor such as Raspberry Pi [24] and Jetson TX2 [18]. STARRY NET is able to flexibly adjusting the computation capability on each emulated satellite to satisfy various experimental requirements. To validate the computational flexibility, we use CoreMark [5], a well-known processor benchmark to measure the performance of the real hardware and its facsimile created by S TAR RY N ET . As plotted in Figure 9, CoreMark score is a metric that quantifies the computation capability. Higher scores indicate stronger computing capability. For various computation requirements, STARRY NET can mimic similar processor capability based on concrete experimental requirement.

alt text

为满足不同卫星硬件平台的实验需求，STARRY NET支持处理器算力的灵活配置。通过CoreMark基准测试（如图9）验证发现：(1) 星载处理器RAD-5545与HPSC的实测算力可被精准模拟；(2) 商用低功耗平台（树莓派/Jetson TX2）性能特征也可被有效复现；(3) CoreMark分数与处理器性能呈正相关，证明系统具备精确映射不同计算需求的能力。该特性为星上智能计算[23,44]等新型研究提供了可靠的实验支撑。