跳转至

Designing of LEOCraft

Overview of other simulation platforms: The well-known LEO network simulation and emulation platforms in the community are Hypatia [64], StarryNet [67], and xeoverse [63]. xeoverse is a relatively new LEO network emulation platform built upon network emulation platform Mininet [20]. xeoverse claims to outperform Hypatia and StarryNet by large margins [63]; however, xeoverse is not public for community use. On the other hand, StarryNet is a data-driven emulation platform partially available [70] to the community. StarryNet uses Docker containers [15] to emulate each node (satellites, GSes, user terminals) and Python’s thread-based parallelism [32] for updating the state of virtual links (ISLs and GSLs) connecting containers [63]. Thus, StarryNet is resource-intensive, and scalability by default is constrained by the Docker engine’s upper limit of its bridge interface (up to 1,023 containers [13] on a single machine [63]). Due to CPython’s Global Interpreter Lock (GIL) [17], the link state update procedure of StarryNet with thousands of threads experiences a bottleneck, too. Hypatia is based upon ns-3 packet-level simulation platform [22]; hence, the scalability is constrained by the ns-3’s limitations. Since Hypatia fails to utilise multiple CPUs, simulation execution drastically slows down with larger constellation [63,64].

System design: After exploring the implementation of the above platforms, we found that a significant portion of the performance bottleneck arises from the software layer. This is because the LEO satellite constellation is treated as a single, monolithic block. However, within this block, most computations are independent of one another. For example, determining the coverage area of one satellite does not depend on calculations for other satellites. Similarly, computing a route from one GS to another is independent of the routes between other pairs of GSes, and so on. We exploit this in the implementation of LEOCraft [35], where each component (such as satellites and GSes) operates as an independent block with format-restricted data exchange APIs. This allows us to distribute workloads uniformly across all available CPUs, effectively shifting the performance bottleneck from the software layer to the underlying hardware.

In Fig. 14, we represent LEOCraft’s simulation workflow. In LEOCraft LEO constellation builder creates a top-level instance of LEO constellation based on specified design parameters and GS locations. This LEO constellation instance consists of all LEO network component instances (GSes, satellites, and ISLs) in it. The LEO Constellation Simulator acts as an execution framework in LEOCraft. It accumulates LEO constellation instances in the Task queue and evaluates the given batch of instances in parallel. This parallelism is at the functional level across all the LEO constellation instances in the Task queue. For that, LEO Constellation Simulator creates a pool of worker processes corresponding to each available CPU [14]. These worker processes remain active for the entire duration of the simulation execution, thus minimizing process creation overhead too. Each worker receives data (instances of LEO network components) and function references, executes these functions on the data, and sends the results back to the respective LEO Constellation instance through the dispatcher. The LEO Constellation accumulates the results of each asynchronously called function. Once all the function calls return, it writes back the evaluation results. Then LEO Constellation Simulator removes the instance from the Task queue.

This process-based concurrency [14] effectively bypasses the GIL [17], the primary performance bottleneck for CPUbound tasks, allowing uniform use of all CPUs even for a single LEO Constellation instance in the Task queue. As a result, LEOCraft evaluates a large constellation within a few minutes. For the interested readers, §A.5 provides an overview of LEOCraft APIs for simulating any LEO constellation.

这部分详细阐述了 LEOCraft 在软件系统架构上的核心创新. 为了实现对成千上万颗卫星构成的巨型星座进行 "极速" 评估, 作者摒弃了传统平台的架构束缚, 转而采用高度解耦的 多进程并发 设计.

1. 现有仿真平台的致命瓶颈

  • 现有的知名 LEO 网络仿真工具 (如 Hypatia 和 StarryNet) 在面对大规模星座时会遇到严重的软件层可扩展性瓶颈.
  • 具体而言:
    • StarryNet 依赖于 Python 的多线程 (thread-based parallelism), 因此在更新链路状态时会受到 Python 全局解释器锁 (GIL, Global Interpreter Lock) 的严重卡脖子限制.
    • 基于 ns-3 的 Hypatia 则 未能有效利用多核 CPU, 导致随着星座规模变大, 仿真速度呈断崖式下跌.

2. 核心洞见: 网络计算的天然解耦性

  • 传统的仿真器通常将 LEO 星座视为一个巨大的 "单体 (monolithic block)" 进行处理.
  • 但作者敏锐地指出, 在这个巨大网络内部, 绝大部分核心计算是相互独立的:
    • 例如, 计算某一颗卫星的地面覆盖面积, 完全不需要知道其他卫星的状态.
    • 同理, 计算某两个地面站之间的最短路由, 也与其他地面站对的路由计算互不干扰.

3. LEOCraft 的破局: 基于进程的并行架构

  • 抓住 "计算独立" 这一特性, LEOCraft 将 各个组件 (卫星, 地面站等) 设计为独立的模块, 并严格限制它们之间的数据交换 API 格式
  • 最重要的是, LEOCraft 采用了 基于进程的并发技术 (Process-based concurrency):
    • 这种设计彻底绕过了 Python 的 GIL 限制, 将工作负载均匀地分配给所有可用的 CPU 核心.
    • 成功将性能瓶颈从软件层转移到了底层硬件算力上.

4. 系统工作流解析

alt text

如 Fig. 14 所示, LEOCraft 的系统架构如同一条高效的流水线:

  • 数据注入与构建:
    • 如 Fig. 14 顶部所示.
    • LEO Constellation Builder 接收基础数据 (地面站位置, 流量矩阵) 和设计参数 (如高度, 倾角等), 打包生成一个个独立的 LEO Constellation instances.
  • 任务列队:
    • 如 Fig. 14 中部的 Task queue 所示.
    • LEO Constellation Simulator 作为执行框架, 会将这些星座实例放入 任务队列 中等待处理.
  • 进程池与调度引擎:
    • 如 Fig. 14 底部的 Dispatcher 所示.
    • 调度器根据机器的物理 CPU 数量创建了一个 长期活跃的 Worker 进程池 (Worker 1 到 N).
  • 异步并行计算:
    • 每个 Worker 领走一部分独立的数据和计算函数, 在各自的 CPU 上飞速运算, 完成后将结果异步返回.
    • 等所有函数调用结束, 系统再将结果写回硬盘.

TLDR: 这一章是 LEOCraft 能够傲视现有工具的底层护城河. 通过 "拆分任务 + 多进程绕过 GIL + 榨干多核 CPU" 的架构设计, LEOCraft 得以在普通 PC 上仅用几分钟就能跑完几千颗卫星的仿真.