System Design¶

This work introduces a satellite-ground collaborative system tailored to counting, aiming to minimize counting errors in the challenging satellite environment. TargetFuse executes a geospatial counter (e.g., a shallower DNN) in space with lower computing power, producing less accurate counts; and a ground counter (e.g., a deeper DNN) on the ground with higher computing power, producing more accurate counts. To meet the computational and downlinking needs of counting applications, TargetFuse leverages three techniques: adaptive image tiling, clustering-based data deduplication, and bandwidth-aware downlinking throttling. The workflow of TargetFuse is shown in Fig. 3. For each satellite image, TargetFuse divides it into several tiles based on the image resolution and input size of the DNN counters. Next, TargetFuse automatically performs clustering-based data deduplication considering the similarity of the tiles. After that, TargetFuse applies selection logic to determine which logic according to confidence thresholds from the onboard DNN counter. Note that onboard system also downlinks the counting result in space with vital tiles. Hence TargetFuse emits the aggregated object count across space and ground.

本工作介绍了一个为计数任务量身定制的星地协同系统，旨在充满挑战的卫星环境中最大限度地减少计数误差。TargetFuse 在算力较低的太空中执行一个地理空间计数器（例如，一个较浅的DNN），生成精度较低的计数结果；同时在算力较高的地面上执行一个地面计数器（例如，一个更深的DNN），生成更精确的计数结果。为了满足计数应用的计算和下行传输需求，TargetFuse 利用了三项关键技术：自适应图像切片、基于聚类的数据去重和带宽感知的下行传输节流。

alt text

TargetFuse 的工作流程如图3所示:

对于每张卫星图像，TargetFuse 首先根据图像分辨率和DNN计数器的输入尺寸将其分割为若干图块（tile）

接着，TargetFuse 考虑到图块之间的相似性，自动执行基于聚类的数据去重

之后，TargetFuse 应用选择逻辑，根据星上DNN计数器给出的置信度阈值来决定如何处理每个图块

值得注意的是，星上系统也会将太空中的计数结果连同关键图块一起下行传输

最终，TargetFuse 汇总来自太空和地面的结果，输出聚合后的目标总数

A. System Operation¶

1) Energy expenditure: TargetFuse performs orbital counting while adhering to the allocated energy budget of satellites along their trajectories. Utilizing data obtained from an in-orbit satellite [9], energy allocation extends beyond fundamental satellite operations, including propulsion and avionics. Energy is allocated for the following computing activities associated with counting: (1) E cap for capturing images; (2) E com for executing counting on images; (3) Eagg for deriving aggregated counting results in space; (4) E down for downlinking the satellite images to be counted on the ground. The most energy-intensive activities (2) and (4) account for over 60% of the total energy consumption, as illustrated in Fig. 2: during each orbital track, satellites perform several trillions of FLOPs and downlink some images to the ground. In contrast, activities (1) and (3) consume negligible energy: (1) only involves capturing thousands of images from the onboard camera, and (3) only involves a few hundred arithmetic operations. Therefore, activities (2) and (4) are the focus of this work [19]. These two activities align with the satellites lifetime design [9], [20], which optimally utilizes less than 50% of the available battery energy. This efficiency allows for the estimation of the daily energy budget.

能量消耗: TargetFuse 在轨执行计数任务时，严格遵守卫星沿其轨迹分配的能量预算。利用从在轨卫星 [9] 获取的数据，能量分配不仅限于卫星的基本操作（如推进和航空电子设备），还包括与计数相关的以下计算活动：

(1) Ecap 用于捕获图像

(2) Ecom 用于在图像上执行计数

(3) Eagg 用于在太空中汇总计数结果

(4) Edown 用于将待计数的卫星图像下行传输至地面

其中，能量消耗最大的活动是(2)和(4)，占总能耗的60%以上，如图2所示：在每个轨道轨迹期间，卫星执行数万亿次浮点运算（FLOPs）并将部分图像下行传输至地面。相比之下，活动(1)和(3)的能耗可以忽略不计：(1)仅涉及从星上相机捕获数千张图像，(3)仅涉及数百次算术运算。

alt text

因此，本工作重点关注活动(2)和(4) [19]。这两项活动的能耗管理与卫星的寿命周期设计 [9], [20] 相符，即以最优方式使用低于50%的可用电池能量。这种效率使得估算每日能量预算成为可能。

2) Selection logic with different confidence thresholds.: Prior to system execution, the satellite’s OS captures numerous images and selects a DNN counter. The deployment of DNN counters on satellites has become increasingly crucial for guaranteeing counting accuracy. When selecting a counter, a confidence threshold is established based on the onboard satellite’s DNN counter detection. This confidence threshold indicates the probability of accurately counting the objects and falls within the range [0,1] [21].

基于不同置信度阈值的选择逻辑: 在系统执行之前，卫星的操作系统（OS）会捕获大量图像并选择一个DNN计数器。在卫星上部署DNN计数器对于保证计数精度已变得日益重要。在选择计数器时，会基于星上DNN计数器的检测结果建立一个置信度阈值。该阈值表示准确计数目标的概率，取值范围为[0,1] [21]。

3) Objective: Minimizing overall counting error while optimizing energy and bandwidth expenditure.: TargetFuse’s objective is to minimize overall counting errors by allocating energy and bandwidth efficiently. In our implementation, the overall counting error is defined as the mean of the counts across all tiles, a widely employed metric in diverse applications [22]. A smaller counting error indicates heightened confidence in counting accuracy, which ultimately translates to greater benefits for customers. To address the computational and downlink bottlenecks in the counting application, TargetFuse leverages the three following techniques.

目标：在优化能量和带宽消耗的同时，最小化总计数误差: TargetFuse 的目标是通过高效分配能量和带宽来最小化总计数误差。在我们的实现中，总计数误差被定义为所有图块计数的平均值，这是多种应用中广泛采用的度量标准 [22]。较小的计数误差意味着对计数精度的置信度更高，最终能为客户带来更大的效益。为解决计数应用中的计算和下行链路瓶颈，TargetFuse 利用了以下三项技术。

B. Adaptive Image Tiling¶

For each image, TargetFuse is designed to divide images into tiles with a comparatively lower execution overhead. Processing large satellite images, typically containing thousands of megapixels, through the utilization of DNN models in space is an essential solution. However, executing standard models directly on these satellite images may lead to excessive memory and potentially exhausting the available memory. This is particularly challenging in typical space environments where the memory capacity is insufficient for handling such large-scale images. Prior work [6] divided the image into several tiles maximizing inference accuracy at the expense of execution overhead. Additionally, downsampling to the input size of standard model architectures is frequently insufficient for achieving optimal performance [7].

对于每张图像，TargetFuse 旨在以相对较低的执行开销将其分割为图块。在太空中利用DNN模型处理通常包含数千兆像素的大型卫星图像是一个必要的解决方案。然而，直接在这些卫星图像上执行标准模型可能会导致过度的内存消耗，甚至耗尽可用内存。在内存容量不足以处理如此大规模图像的典型太空环境中，这一点尤其具有挑战性。先前的工作 [6] 以牺牲执行开销为代价，将图像分割为若干图块以最大化推理精度。此外，仅仅将图像下采样到标准模型架构的输入尺寸，通常不足以实现最佳性能 [7]。

A large image can be segmented into either a larger tile size with fewer tiles or a smaller tile size with numerous tiles. After scaling each tile into input size of DNN counter, the execution time per tile remains constant. Theoretically, opting for a larger tile size reduces image processing time, as each tile undergoes less degradation. Conversely, selecting a smaller tile size increases processing time, and each tile also undergoes less degradation. To explore the impact of tile size on both inference accuracy and execution overhead, we conducted measurements on two datasets, as shown in Fig. 4. Interestingly, both datasets display similar curves. As the tile size increases, the execution time decreases due to a smaller number of tiles per image. However, there is an optimal tile size that maximizes accuracy. Accuracy tends to deteriorate when the tile size deviates from this optimal size. This observation aligns with findings from previous studies [6], [10]. The optimal tile size enables substantial improvements in accuracy while maintaining acceptable time. Moreover, the optimal tile size is not constant but varies depending on the DNN counters and image input size. Consequently, we determine the optimal tile size based on the combination of satellite image and DNN counter. Considering the trade-off between geospatial analysis accuracy (i.e., mAP accuracy) and execution overhead (i.e., processing time), we aim to identify an optimal tile size that aligns with the input size of the DNN counter.

alt text

一张大图像可以被分割成尺寸较大但数量较少的图块，或者尺寸较小但数量众多的图块。在将每个图块缩放到DNN计数器的输入尺寸后，处理单个图块的执行时间保持不变。理论上，选择较大的图块尺寸可以减少图像处理时间，因为每个图块的降采样程度较小。反之，选择较小的图块尺寸会增加处理时间。为了探究图块尺寸对推理精度和执行开销的影响，我们在两个数据集上进行了测量，如图4所示。有趣的是，两个数据集呈现出相似的曲线。随着图块尺寸的增加，由于每张图像的图块数量减少，执行时间随之下降。然而， 存在一个使精度最大化的最佳图块尺寸 。当图块尺寸偏离这个最佳值时，精度往往会下降。这一观察结果与先前的研究发现 [6], [10] 一致。最佳图块尺寸能够在保持可接受时间的同时，实现精度的显著提升。此外，最佳图块尺寸并非固定不变，而是随DNN计数器和图像输入尺寸的变化而变化。因此，我们根据卫星图像和DNN计数器的组合来确定最佳图块尺寸。考虑到地理空间分析精度（即mAP精度）和执行开销（即处理时间）之间的权衡，我们旨在找到一个与DNN计数器输入尺寸相匹配的最佳图块尺寸。

We present a detailed approach for optimizing image size in tile-based processing using Algorithm 1. The process initiates by initializing the tile sizes. Drawing from the measurement results before NN counter deployment, we promptly narrow down the search interval and empirically establish the minimum and maximum tile sizes. We then iterate until the first image size meets the preset empirical size difference threshold. Specifically, we divide the search interval into three equal fractions and compare the mAP accuracy at different sizes to identify optimal search intervals. The optimal tile size lies in the interval [s midl , s right ] when mAP s left < mAP s right , and vice versa. Finally, we obtain an approximate optimal tile size by taking the midpoint of the interval. This method achieves a balance between accuracy and computational efficiency, providing a user-friendly solution with improved speed and accuracy of counting application.

我们通过算法1提出了一种优化基于图块处理的图像尺寸的详细方法。该过程首先初始化图块尺寸。借鉴神经网络计数器部署前的测量结果，我们迅速缩小搜索区间，并凭经验确定最小和最大图块尺寸。然后我们进行迭代，直到首个图像尺寸满足预设的经验尺寸差异阈值。具体来说，我们将搜索区间三等分，并比较不同尺寸下的mAP精度，以确定最优搜索区间。当 mAPsleft < mAPsright 时，最优图块尺寸位于区间 [smidl,sright] 内，反之亦然。最后，我们通过取区间的中点来获得一个近似的最优图块尺寸。该方法在精度和计算效率之间取得了平衡，提供了一种用户友好的解决方案，提高了计数应用的速度和准确性。

C. Clustering-based Data Deduplication¶

TargetFuse is tasked with classifying each tile into geographic contexts while complying with the computational constraints imposed by satellite hardware. A geographic context refers to a subset of images characterized by a high degree of similarity, along with geographic and transformation features. It is common for these images to exhibit a substantial degree of similarity, often remaining relatively static over time. EO satellites periodically pass over identical locations on Earth’s surface, capturing images that exhibit significant similarity or near-identical characteristics at different times along their orbital path [25]. As shown in Fig. 5, the two tiles acquired after tiling include an identical number of similar images. This is due to the short average revisit cycle of each satellite, such as GF-3, which revisits the same area at least twice every day, enabling the capture of the same geographical area multiple times [26].

TargetFuse 的任务是在遵守卫星硬件计算限制的同时，将每个图块分类到不同的地理环境中。地理环境（geographic context）指的是具有高度相似性以及地理和变换特征的图像子集。这些图像通常表现出显著的相似度，并且随时间变化相对静止。 EO卫星会周期性地飞越地球表面的相同位置，在其轨道路径上的不同时间捕获到具有显著相似性或近乎相同特征的图像 [25] 。如图5所示，切片后获得的两个图块包含了相同数量的相似图像。这是由于每颗卫星的平均重访周期很短，例如高分三号（GF-3）每天至少重访同一区域两次，使其能够多次捕获同一地理区域 [26]。

The presence of numerous semantically similar images exerts substantial pressure on the limited computational capacity within a satellite. This heightened demand for processing images may pose a computational challenge. To tackle this challenge, we propose a data deduplication strategy that involves processing representative tiles based on geographic context, rather than processing all similar tiles. Certain images are computationally less demanding in specific contexts than in others. Due to the high predictability of satellite orbits, determining the contexts can be readily achieved. However, there may be image tiles for which the contexts are not immediately apparent, posing a challenge in their generation.

大量语义相似图像的存在，给卫星有限的计算能力带来了巨大压力。这种对图像处理需求的增加可能构成计算挑战。为了应对这一挑战，我们提出了一种数据去重策略，即基于地理环境处理有代表性的图块，而不是处理所有相似的图块。由于卫星轨道的高度可预测性，确定地理环境可以很容易实现。然而，可能存在一些地理环境不明显的图像图块，这给其生成带来了挑战。

The technique efficiently generates contexts for image tiles by dividing them into multiple contexts. To cluster the representative image tiles based on similarity, the technique utilizes a low-dimensional label vector indicating the geographic features described by computing moments [27] present in each image tile. The technique creates a set of contexts by performing k-means clustering while exploring the Euclidean distance of the label vectors to measure similarity. Moreover, to enhance the DNN counter’s robustness to diverse sensors, we consider geographic label transformations like translations and rotations as objects may have arbitrary headings between 0 and 360 degrees. We also explore a range of cluster counts when partitioning the dataset into several clusters. Further investigation of this hyperparameter space represents an exciting avenue for future research.

该技术通过将图像图块划分为多个环境来高效地生成它们的上下文。为了基于相似性对代表性图像图块进行聚类，该技术利用一个低维标签向量，该向量表示每个图像图块中通过计算矩（moments）[27] 描述的地理特征。该技术通过执行 k-means 来创建一组环境，同时利用标签向量的欧几里得距离来衡量相似性。此外，为增强DNN计数器对不同传感器的鲁棒性，我们考虑了地理标签的变换，如平移和旋转，因为目标可能具有0到360度之间的任意朝向。我们还在将数据集划分为若干聚类时探索了一系列聚类数量。对这一超参数空间的进一步研究是未来一个令人兴奋的研究方向。

D. Bandwidth-aware downlinking throttling¶

The downlinking bottleneck also poses a significant challenge in in-orbit counting, impacting the overall performance. When receiving the tiles with confidence thresholds, TargetFuse employs selection logic to determine the policy for handling these tiles. The optimal policy ensures that downlinking remains within the bandwidth budget constraint while transmitting as many tiles as possible to the ground to minimize counting errors.

The selection logic, based on the confidence threshold from the space-based DNN counter, is categorized into three groups (in Fig. 3): when the confidence threshold is relatively smaller (i.e., < con f p ), TargetFuse discards them directly; when confidence threshold is large enough (i.e., > con f q ), TargetFuse accepts the counting result; only when confidence threshold is between con f p and conf q (i.e., [conf p ,conf q ]), TargetFuse downlinks the tiles and processes them on the ground DNN counter. To comprehensively explore how the confidence threshold affects CMAE (i.e., the mean difference between the estimated count and the ground truth), we vary the confidence threshold con f p under different contact times between satellite and ground (i.e., different downlinking data volume). As the tiles are sorted by the confidence threshold, and the objective is to downlink as many tiles as possible, we must consider the following three methods based on the choice of confidence thresholds when downlinking the tiles within [con f p ,con f q ] under the limited bandwidth budget constraint:

• Low-Conf-First: If there are still tiles remaining when the bandwidth is exhausted, we proceed to directly count these tiles and downlink the results.

• Fixed Conf: If there are still tiles remaining when the bandwidth is exhausted, we only count tiles whose confidence thresholds are higher than the fixed con f q .

• Dynamic Con f: We first count the tiles with confidence thresholds higher than the preset con f q and then count the tiles whose confidence thresholds are within [conf p ,conf q ] until the bandwidth is exhausted. The value of conf q varies depending on the downlinking constraint.

The observations from Fig. 6 are summarized as follows:

• Dynamic conf p leads to performance improvement. When the downlink capacity is insufficient, both LowConf-First and Dynamic Con f show similar performance, as all remaining tiles are counted and Dynamic Con f is not effective. In this scenario, a larger conf p leads to a lower CMAE, as it facilitates the downlinking of more high-confidence tiles. Moreover, when downlink capacity is sufficient, both Fixed Con f and Dynamic Con f exhibit comparable performance, as all tiles are counted. Consequently, Fixed Con f is not enabled. In this case, a larger con f p results in higher CMAE, as it discards the high-value tiles with confidence thresholds below con f p . Therefore, choosing an appropriate dynamic con f p is crucial for improving counting performance. It motivates to downlink high-confidence images first and downlink some low-confidence images within the available bandwidth.

• Optimal con f p improves performance. When downlink capability is sufficient and con f p increases to a certain value, all methods discard the tiles below conf p , resulting in a counting error. Here both Low-Conf-First and Fixed Con f exhibit identical performance, since Fixed Conf fails to work effectively. Additionally, Dynamic Conf incurs a higher counting error, as counting tiles larger than con f q in space is not as accurate as on the ground. Therefore, optimizing con f p is crucial.

• con f q alleviates the downlink constraint without greatly affecting performance. In Fig. 6(d), when the downlink capability is sufficient (i.e., dynamic confq does not change), and the initial con f q is not too large (i.e., smaller than 0.2), Dynamic Con f and Low-ConfFirst show similar performance. However, Dynamic Conf has the advantage of increasing the downlink volume compared to Low-Conf-First.

Therefore, strategically selecting the optimal confidence threshold is crucial to downlink more high confidence tiles. However, we face the challenge of determining the downlinking method for confidence thresholds and fully exploiting the bandwidth budget. Algorithm 2 describes the bandwidth-aware downlinking throttling procedure, which takes the bandwidth requirement and tiles obtained by tiling and clustering (as shown in Fig. 3) as input and produces a set of tiles that can be downlinked by the scarce bandwidth constraint. We first identify all the clustered tile sets on the satellite and discard tiles with a confidence threshold lower than a specified empirically con f p (lines 5-6). Tiles with high confidence confq are included directly in the C space (lines 7-8). We calculate the current remainder bandwidth and maximize it to downlink images (lines 13-18), In other words, the remaining tiles are sorted based on data size in descending order, and the count results are added to the transmitted tile set S trans if sufficient available bandwidth is present.

下行链路瓶颈 同样是在轨计数（in-orbit counting）面临的一项重大挑战，它影响着系统的整体性能。当接收到带有置信度阈值的图块（tiles）时，TargetFuse 系统会采用一种选择逻辑来决定处理这些图块的策略。最优策略应确保下行传输（downlinking）在带宽预算约束内，同时尽可能多地将图块传输至地面，以最大限度地减少计数误差。

该选择逻辑基于星上深度神经网络（DNN）计数器给出的置信度阈值，将图块分为三类（如图3所示）：

当置信度阈值相对较小（即 < conf_p）时，TargetFuse 直接丢弃它们
当置信度阈值足够大（即 > conf_q）时，TargetFuse 接受在轨计数结果
只有当置信度阈值介于 conf_p 和 conf_q 之间（即 [conf_p, conf_q]）时，TargetFuse 才会 下行传输 这些图块，并在地面 DNN 计数器上进行处理

为了全面探究置信度阈值如何影响 CMAE（即估计计数与真实值之间的平均差异），我们在卫星与地面站的不同接触时长（即不同的下行数据量）下，改变了置信度阈值 conf_p。由于图块已按置信度阈值排序，且目标是下行传输尽可能多的图块，因此在有限的带宽预算约束下，我们必须考虑以下三种基于置信度阈值选择来下行传输 [conf_p, conf_q] 区间内图块的方法：

低置信度优先（Low-Conf-First）：如果在带宽耗尽时仍有剩余图块，我们直接在轨对这些图块进行计数，并仅下行传输计数结果
固定置信度（Fixed Conf）：如果在带宽耗尽时仍有剩余图块，我们仅在轨对那些置信度阈值高于固定值 conf_q 的图块进行计数
动态置信度（Dynamic Conf）：我们首先将在轨计数那些置信度高于预设 conf_q 的图块，然后继续计数置信度在 [conf_p, conf_q] 区间内的图块，直至带宽耗尽。conf_q 的值会根据下行链路的约束动态变化

从图6中可以总结出以下观察结果：

alt text

动态调整 conf_p 能带来性能提升。当下行容量不足时，Low-Conf-First 和 Dynamic Conf 表现相似。在这种情况下，一个较大的 conf_p 会带来更低的 CMAE，因为它有助于优先下行传输更多高置信度的图块。反之，当下行容量充足时，一个 较大的 conf_p 反而会导致更高的 CMAE，因为它会丢弃那些置信度低于 conf_p 但仍有价值的图块。因此，选择一个合适的动态 conf_p 对提升计数性能至关重要。这启发我们应优先下行高置信度的图像，并在可用带宽内下行部分低置信度的图像。
优化的 conf_p 能提升性能。当下行能力充足且 conf_p 增加到某个特定值时，所有方法都会丢弃置信度低于 conf_p 的图块，从而导致计数误差。此外，Dynamic Conf 会产生更高的计数误差，因为在轨计数那些置信度高于 conf_q 的图块不如在地面上处理精确。因此，优化 conf_p 至关重要。
conf_q 在不显著影响性能的情况下缓解了下行约束。在图6(d)中，当下行能力充足且初始 conf_q 不太大（例如小于0.2）时，Dynamic Conf 和 Low-Conf-First 表现相似。然而，与 Low-Conf-First 相比，Dynamic Conf 的优势在于能增加下行传输的数据量。

因此，策略性地选择最优置信度阈值对于下行传输更多高置信度的图块至关重要。算法2 描述了带宽感知的下行传输节流程序，该程序将带宽需求以及通过切片和聚类获得的图块作为输入，并产出一组在稀缺带宽约束下可以下行传输的图块集。

我们首先识别卫星上所有聚类后的图块集，并丢弃置信度低于指定经验值 conf_p 的图块（第5-6行）。具有高置信度 conf_q 的图块的在轨计数结果被直接采纳（第7-8行）。我们计算当前剩余带宽并最大化利用它来下行传输图像（第13-18行）。换言之，剩余的图块按数据大小降序排序，如果存在足够的可用带宽，则将图块添加到待传输的集合 S_trans 中。