System Design¶

Fig. 7 shows Umbra’s control plane architecture. Umbra’s scheduler runs on the cloud and communicates the latest schedule to ground stations, which then relay them to satellites upon next contact. The two key components in Umbra are: (a) Simulator, and (b) Scheduler. The former simulates the evolution of the satellite-ground station links using Two Line Element (TLE) orbit descriptors to both perform orbit calculations [24] and to compute link capacities using a link quality model [26–28]. Profilers running on ground stations continuously relay queue sizes and cloud bandwidth data as input to the Umbra Simulator. Umbra’s second component, the Scheduler, interacts with the Simulator in an interactive way. The Scheduler constructs the time expanded network (TEN) and computes the optimal data transfer plan.

Updating the data transfer plan: Typically, the TLE orbit descriptors are updated periodically (on the order of a day to few days) to maintain accuracy. Therefore Umbra pulls new orbital data and calculates a new plan every five days, and relays it to the ground stations and satellites. We later evaluate the effect of an outdated plan. Umbra can be forced to generate a new schedule upon events like crashes, new satellite deployment, ground station upgrades, etc.

Handling component failure: Satellite failure is common in large constellations. For example, solar flares recently caused 40 of 49 SpaceX satellites to fail after a recent launch [4]. Ground stations may also fail, due to power outage, machine reboot, extreme weather or local events. We assume the cloud has redundancy and is always available.

Whenever a satellite fails, Umbra computes a new schedule. While a new schedule is being calculated, all satellites and ground stations continue using the old (latest) schedule. Newly joining satellites wait to receive a new plan before transmitting anything and store data locally.

The failure of a ground station has a greater impact, since satellites have to route their data through the ground station. After such a failure, while Umbra is generating a new data transfer plan, a satellite which encounters a failed ground station will detect the lack of acknowledgments, and merely withhold all its planned data until it encounters the next non-faulty ground station.

alt text

(1) 系统设计 (System Design)

图7展示了Umbra系统的控制平面架构。Umbra的 调度器 (Scheduler) 运行在云端，它将最新的调度计划通告给地面站，地面站则在下一次与卫星接触时将计划中继给卫星。Umbra系统包含两个关键组件:

(a) 模拟器 (Simulator) 和 (b) 调度器 (Scheduler)

模拟器：使用 两行轨道根数 (Two Line Element, TLE) 来模拟卫星与地面站之间链路的动态演变，这既用于轨道计算，也用于通过链路质量模型来估算链路容量。此外，运行在地面站上的 性能分析器 (Profilers) 会持续地将当前的队列长度和云端带宽数据作为输入，中继给Umbra模拟器
调度器：它与模拟器进行交互，负责构建时间扩展网络 (Time Expanded Network, TEN) 并计算出最优的数据传输计划

(2) 数据传输计划的更新

通常，为了保持精度，TLE轨道根数需要周期性地更新（更新周期为一天到几天）。因此，Umbra系统每五天会拉取一次新的轨道数据并计算新的传输计划，然后将其下发给地面站和卫星。我们将在后文评估使用过时计划所带来的影响。当发生系统崩溃、新卫星部署、地面站升级等事件时，也可以强制Umbra生成新的调度计划

(3) 组件故障处理

在大型卫星星座中，组件故障是常见的。例如，近期的太阳耀斑就曾导致SpaceX公司在一次发射后损失了49颗卫星中的40颗。同样，地面站也可能因停电、设备重启、极端天气或局部事件而发生故障。在此设计中，我们假设云端具备冗余性且始终可用

卫星故障：每当有卫星发生故障，Umbra会重新计算一个调度计划。在新计划的生成过程中，所有其他卫星和地面站将继续沿用旧的（最新的）调度计划。新加入星座的卫星在接收到新计划前会暂不传输任何数据，而是将数据存储在本地
地面站故障：地面站故障会产生更严重的影响，因为卫星必须通过地面站来路由其数据。在此类故障发生后，当Umbra正在生成新传输计划的期间，如果一颗卫星按原计划接触到一个已故障的地面站，它会因检测不到 确认信号 (acknowledgments) ，而直接 暂存 (withhold) 所有计划传输的数据，直到它遇到下一个无故障的地面站为止