End-to-End Results¶

End-to-end Performance¶

We evaluate the end-to-end latency for all the latency-sensitve imagery that matches our two applications. We define the latency as the delay between when the corresponding image is generated and when the information is delivered to the cloud and available for the user to download. For the vessel counting application, just the vessel count needs to be delivered to the user. Fig. 7 shows the distribution of the end-to-end latency achieved by Serval compared to in-order delivery. We plot this result for two hardware configurations of the AGX ORIN – 15W power consumption and 30W power consumption, to benchmark Serval’s benefits across different types of computational platforms. AGX ORIN can operate in these power modes and offers less computational capability in the lower power mode.

我们评估了 两个应用场景中所有延迟敏感图像的端到端延迟。我们将延迟定义为：从相应图像生成到信息被交付至云端并可供用户下载之间的时间间隔。对于船只计数应用，只需将船只计数值交付给用户即可。图7展示了Serval系统实现的端到端延迟分布，并与按序传输（in-order delivery）的基线方法进行了比较。我们针对AGX ORIN的两种硬件配置——15W功耗和30W功耗——绘制了此结果，旨在衡量Serval在不同类型计算平台上的效益。AGX ORIN能够在这两种功率模式下运行，在较低功率模式下其计算能力相应减弱。

Using the 15W power mode, and a monolithic ground station architecture, Serval reduces the median latency by 70× from 78.2 hours to 1.1 hours, and 90-th percentile from 145.55 hours to 2.71 hours. For Serval, a large part of this latency stems from the fact that even after high-priority images have been identified, they must wait for the first ground station contact. Distributed ground stations (DGS) [7,46,61] have recently been designed to provide more opportunities for data download. In the DGS case, Serval can download images with a median latency of 0.03 hours, compared to 71.71 hours median latency for in-order delivery. Even the 90th percentile for Serval is 0.78 hours, compared to 149.05 hours for in-order delivery. This result shows that even with simple compute capabilities on the satellite, Serval can achieve near-realtime delivery of latency-sensitive insights.

Next, we compare results for the high-power 30W mode on Jetson AGX ORIN deployed on the satellite. In this case, the delays for the baseline do not change because it includes no compute. For Serval, the median delay is 1.1 hours (90-th percentile – 2.7 hours) and 0.03 hours (90-th percentile – 0.78 hours) for monolithic ground stations and DGS respectively. Note that the median delay for monolithic stations does not change. This validates our hypothesis that the delay stems from the sparsity of ground station locations. When combined with DGS, Serval can achieve a median latency of few minutes. Unless noted otherwise, the evaluation below is performed using the 30W, DGS, 1% resource-constrained setting.

alt text

在15W功率模式和单体地面站（monolithic ground station）架构下，Serval将延迟中位数从78.2小时降低到1.1小时，降幅达70倍；将90百分位延迟从145.55小时降低到2.71小时。对于Serval而言，其延迟的很大一部分源于这样一个事实：即便是已被识别为高优先级的图像，也必须等待下一次与地面站的接触机会。分布式地面站（Distributed ground stations, DGS）[7, 46, 61] 的最新设计旨在提供更多的数据下载机会。在DGS场景下，Serval能够以0.03小时的延迟中位数下载图像，而按序传输的延迟中位数为71.71小时。即便是Serval的90百分位延迟也仅为0.78小时，远低于按序传输的149.05小时。这一结果表明，即便卫星只具备简单的计算能力，Serval也能够实现对延迟敏感分析结果的近实时交付。

接下来，我们比较了在卫星上部署高功率30W模式的Jetson AGX ORIN的结果。在这种情况下，基线系统的延迟没有变化，因为它不涉及星上计算。对于Serval，在使用单体地面站和DGS时，其延迟中位数分别为1.1小时（90百分位延迟为2.7小时）和0.03小时（90百分位延迟为0.78小时）。值得注意的是，单体地面站场景下的延迟中位数并未改变。这验证了我们的假设，即延迟主要源于地面站位置的稀疏性。当与DGS结合使用时，Serval可实现数分钟级别的延迟中位数。除非另有说明，下文的评估均在30W功耗、DGS、1%资源受限的设定下进行。

Impact of scaling up applications: Recall that, for the evaluation below, we limit the Serval’s resource usage for highpriority images to 1%. We would like to see how having more or less applications would impact the performance of Serval, which is equivalent to changing the resource usage threshold. We tested resource limit values of 0.05%, 0.1% and 0.5%. Figure 7c shows that Serval’s performance remains steady even when the resource limit is reduced to 0.5% from 1%. Its performance starts to decrease when the resource limit is reduced to 0.1%, and the computation starts to emerge as a bottleneck. As discussed in our future work (Sec. 8), such bottlenecks can be overcome by architectural optimizations.

应用规模扩展的影响： 回顾之前的设定，在下文评估中，我们将Serval用于高优先级图像的资源使用率限制在1%。我们希望了解，增加或减少应用数量（这等同于改变资源使用阈值）会对Serval的性能产生何种影响。我们测试了0.05%、0.1%和0.5%的资源限制值。图7c显示，当资源限制从1%降低到0.5%时，Serval的性能保持稳定。当资源限制进一步降低到0.1%时，其性能开始下降，此时计算能力开始成为瓶颈。正如我们在未来工作（第8节）中讨论的，此类瓶颈可通过架构优化来克服。

Effect of running computation on satellites: To evaluate the effect of running compute on satellite, we compared the performance of Serval against the case when satellites will do no computation tasks on board but only prioritize all images based on glacial filters alone. The results are illustrated in Figure 8a. By running compute on the satellite, Serval was able to reduce the median latency by 23×.

One might wonder why filtering based on glacial filters is insufficient since the number of California forest images is small. This is because the high-priority traffic arrives in a bursty manner: when a satellite is over an area of interest, it continuously captures images that need to run the computation. When a satellite is not over an area of interest, it does not require computation. This is true for all applications whose images are not evenly distributed across the globe. Therefore, when a satellite is in contact with a ground station, the proportion of images in transmission queues with high “pre-computed" scores could be much larger than the average during a satellite-ground station contact. On the other hand, we know that for instance, only 1% of images are forest fire in all California forest images. Therefore, running computation on the satellite can help filter a significant chunk of images that pass the glacial filter pre-computed on the ground.

alt text

星上计算的效果： 为了评估在卫星上运行计算的效果，我们将Serval的性能与另一种情况进行了比较：卫星在轨不执行任何计算任务，仅依据静态滤波器（glacial filters）来划分所有图像的优先级。结果如图8a所示。通过在卫星上运行计算，Serval能够将延迟中位数降低23倍。

有人可能会问，既然加州森林的图像数量本就不多，为何仅基于静态滤波器进行过滤还不够？这是因为高优先级数据流量以突发（bursty）形式到达：当卫星飞越感兴趣区域时，它会连续捕获需要进行计算的图像。而当卫星不在感兴趣区域上空时，则不需要计算。对于那些图像在全球分布不均的应用而言，这一现象普遍存在。因此，在卫星与地面站接触的窗口期，传输队列中拥有较高“预计算”得分的图像比例，可能远高于其在整个轨道周期中的平均比例。另一方面，我们已知例如在所有加州森林图像中，真正包含森林火灾的仅占1%。因此，在卫星上运行计算，可以帮助过滤掉大量通过了地面站预计算的静态滤波器的无关图像。

Benefit of using auxiliary information: To validate our decision to use side-channel information for cloud detection, we first checked the accuracy of weather information. The results are shown in Table 5. In this experiment, we only run the cloud detector on images that have a cloud coverage value between 20% and 80%. From the table, we can see that this method saves 83.6% computation on cloud detection while yielding an accuracy of 96.3% and recall of 99.3%. Indeed, we can precisely estimate the cloudiness of a majority of images without sacrificing accuracy, saving a large amount of compute.

We evaluated the weather information’s contribution to the end-to-end performance by comparing Serval against when the side channel information is disabled. The results are shown in Figure 8b. We can see that by employing weather information, Serval improves the median latency by 8.8×.

使用辅助信息的增益： 为验证我们使用旁路信息（side-channel information）进行云检测的决策，我们首先检验了天气信息的准确性。结果如表5所示。在该实验中，我们仅对云量覆盖值在20%到80%之间的图像运行云检测器。从表中可以看出，该方法在云检测上节省了83.6%的计算量，同时实现了96.3%的准确率和99.3%的召回率。这表明，我们确实可以在不牺牲准确率的前提下，精确估计绝大多数图像的云量，从而节省大量计算资源。

我们通过将Serval与禁用旁路信息的版本进行比较，评估了天气信息对端到端性能的贡献。结果如图8b所示。可以看出，通过采用天气信息，Serval将延迟中位数改善了8.8倍。

Benefit of using historical data: Does pre-computation of glacial filters on the ground station have an advantage? To test this hypothesis, we move the Forest filter to the satellite and test if this move hurts the latency. For this experiment, we consider a single application: “California forest fire". The comparison is shown in Figure 8c. We can see that the median latency increased by 1670× when we don’t use historical data. We observe that there are two reasons for this large performance drop: (a) running forest model on satellites consumes a great amount of computation time (the number of images is larger, and each image requires more computational resources) and (b) since the model is being run in real-time, some of the fire images do not look like forest for the neural network, because of the presence of the smoke. Therefore, these images get misclassified as ‘not forest’ and placed in the low priority queue. However, Serval considers stale data for such analysis when it places the glacial filter execution on ground stations. Such images are not occluded by forest. The second benefit is an unintended consequence of running glacial filters on ground stations.

使用历史数据的增益： 在地面站上预计算静态滤波器是否有优势？为了检验这一假设，我们将森林（Forest）滤波器移至卫星上执行，并测试此举是否会损害延迟性能。在此实验中，我们仅考虑单一应用：“加州森林火灾”。比较结果如图8c所示。

我们发现，在不使用历史数据时，延迟中位数增加了1670倍。我们观察到性能大幅下降有两个原因：

(a) 在卫星上运行森林模型消耗了大量的计算时间（图像数量更多，且每张图像需要更多计算资源）；

(b) 由于模型是实时运行的，部分火灾图像因烟雾的存在，在神经网络看来可能不像森林。因此，这些图像被误分类为“非森林”，并被放入了低优先级队列。然而， Serval在地面站执行静态滤波器时，会利用非实时的历史数据（stale data）进行此类分析，这些历史图像中并不包含当前火灾产生的烟雾遮挡。

这第二点增益是在地面站运行静态滤波器的一个意外收益（unintended consequence）。

Comparison to early discard: As an extension to the above experiment, we compared Serval’s performance against an early discard scheme inspired by OEC [23]. OEC relies solely on computation on satellites and discards all images deemed low priority. Similar to the above experiment, since OEC does not use historical data or ground station computation, it incorrectly discards some images that contain forest fire. In fact, OEC only successfully identified and downlinked 14.7% of all images with forest fire. Since it discards all “low priority" images on the satellite, the “false negative" images do not even have a chance to reach the user. The “false negative" images appear in the “long tail" distribution in Figure 8c. In contrast, Serval’s classifier makes fewer mistakes because of its reliance on historical data. Even when Serval makes a mistake, it just adds additional latency to that image rather than discarding that image. For the vessel counting application, both OEC and Serval are able to downlink all images containing vessels primarily because historical data doesn’t help with the classification strategy.

与早期丢弃策略的比较： 作为上述实验的延伸，我们将Serval的性能与一个受OEC [23] 启发的早期丢弃（early discard）策略进行了比较。OEC完全依赖于星上计算，并会丢弃所有被判定为低优先级的图像。与上一个实验类似，由于OEC不使用历史数据或地面站计算，它会错误地丢弃一些包含森林火灾的图像。事实上，OEC仅成功识别并下传了所有火灾图像中的14.7%。由于它在卫星上就丢弃了所有“低优先级”图像，这些“漏报”（false negative）的图像甚至没有机会到达用户手中。这些漏报的图像体现在图8c的“长尾”分布中。相比之下， 由于依赖历史数据，Serval的分类器犯错更少 。即便Serval偶尔犯错，它也只是增加了该图像的延迟，而不是直接丢弃它。对于船只计数应用，OEC和Serval都能成功下传所有包含船只的图像，这主要是因为历史数据对该应用的分类策略没有帮助。

Satellite Power Usage¶

We monitored the power usage for different functions during the simulation period. The energy cost consists of 3 main parts: regular power (ADACS and other essential functions to keep the satellite alive), transmission power and compute power. Figure 9 illustrates a sample satellite’s ("Dove 103b") power consumption profile during a period in which it flys over California. We observed that while the satellite is constantly consuming power for regular functions and from time to time for transmission, the compute function is only activated when the satellite receives some potential high-priority image. We can see that the compute power consumption is much more sparse than either transmission or regular maintenance. In our simulation, we saw 68.1% energy being consumed by transmission, 28.9% consumed by regular maintenance, and 2.9% consumed by computation. Because of our limit on resource utilization for the high-priority applications, we only used 1% of the 2.9% total compute energy, and the rest of the energy was reserved for other computation tasks (e.g., for other tenant applications, satellite maneuvering, etc. ).

alt text

我们监测了模拟周期内不同功能的功耗情况。能源成本主要包括3个部分：常规功耗（姿态确定与控制系统ADACS及其他维持卫星生存的基本功能）、传输功耗和计算功耗。图9展示了一颗示例卫星（“Dove 103b”）在飞越加州期间的功耗曲线。我们观察到，虽然卫星在为常规功能持续消耗功率，并间歇性地为传输消耗功率，但计算功能仅在卫星接收到潜在的高优先级图像时才被激活。可以看出，计算功耗的消耗模式远比传输或常规维护要稀疏。在我们的模拟中，68.1%的能源被传输消耗，28.9%被常规维护消耗，而仅有2.9%被计算消耗。由于我们对高优先级应用的资源利用率进行了限制，在这2.9%的总计算能耗中，我们仅使用了其1%，剩余的能量则被保留用于其他计算任务（如其他租户应用、卫星机动等）。