跳转至

END-TO-END CONGESTION CONTROL STRUGGLES UNDER LEO MOBILITY

Network Characteristics of LSNs

First, we conduct three independent experiments to profile Starlink’s basic network characteristics (i.e., maximum link capacity, RTT and packet loss rate) based on two terminals of our research partners deployed in New Jersey (NJ) and Madrid (MD). In particular, to measure the maximum link capacity, we follow the probing method used in [29] and run iperf3 on the end host to saturate the satellite link by generating high-volume UDP traffic between the terminal and an Oracle cloud server behind the corresponding gateway. We use tcpdump to capture packet-level network traces for analysis. Each cloud server has sufficient incoming/outgoing network capacity and thus is not the bottleneck on the end-to-end path. We connect our end-host laptop to the home router via a wired Ethernet which has sufficient link capacity and is also not the bottleneck. To measure RTT, we perform ping tests every 5ms from the end host to the server. To measure the packet loss rate, we generate UDP traffic between the end host and server to calculate the loss rate in every 100ms interval based on the tcpdump trace.

首先,我们通过三组独立实验对Starlink的基本网络特性(即最大链路容量、RTT和数据包丢失率)进行了分析。这些实验基于我们研究合作伙伴分别部署在新泽西(NJ)和马德里(MD)的两个终端进行。具体而言:

最大链路容量测量

我们参考文献中的探测方法,在端主机上运行iperf3,通过生成高流量的UDP数据包来饱和卫星链路,并与位于相应网关后的Oracle云服务器通信。我们使用tcpdump捕获分组级网络追踪数据以供分析。每个云服务器均具备充足的收发网络容量,因此不会成为端到端路径中的瓶颈。此外,我们通过有线以太网将端主机笔记本连接到家庭路由器,确保其链路容量充足,也不会成为瓶颈。

RTT测量

我们从端主机到服务器每5毫秒执行一次ping测试。

数据包丢失率测量

通过在端主机与服务器之间生成UDP流量,基于tcpdump追踪数据计算每100毫秒间隔内的丢包率。

alt text

Based on our real-world measurement, we find that from an average perspective Starlink performs quite well: the average uplink/downlink capacity can reach about 32/187Mbps in NJ and 30/357Mbps in MD, while the average RTT in NJ/MD is about 59/27ms. However, due to the LEO mobility, end-to-end connections through Starlink suffer from drastic network variations over time as plotted in Figure 2: (i) the maximum link capacity can drastically fluctuate between 10Mbps and 65Mbps in three minutes; (ii) RTT drastically changes over time. Based on our further traceroute investigation, we find the high delay jitter (抖动) between the end host and gateway is the culprit; and (iii) even at low data rates below network capacity, end-to-end flows can experience unpredictable bursts of packet loss during data transmission.

基于我们的真实环境测量,从平均表现来看,Starlink性能表现较为优秀:

  • 在NJ的平均上行/下行链路容量可达约32/187Mbps,而在MD则为30/357Mbps;
  • NJ/MD的平均RTT分别为约59ms和27ms。

然而,由于LEO卫星的移动性,通过Starlink的端到端连接会随时间经历剧烈的网络变化,如图2所示:

  1. 链路容量剧烈波动:在三分钟内,最大链路容量可能从10Mbps大幅波动至65Mbps;
  2. RTT显著变化:进一步的traceroute调查表明,端主机与网关之间的高延迟抖动是主要原因;
  3. 突发性数据包丢失:即使在低于网络容量上限的数据速率(按常理应该安全运行)下,端到端流量在传输过程中仍可能遭遇不可预测的突发性丢包。

这些结果揭示了LEO卫星网络中由移动性引发的不稳定性对网络性能的深远影响。

LEO卫星的移动性带来的网络变化

LEO卫星的移动性 -> 副作用量化:

  1. 链路容量剧烈波动:在三分钟内,最大链路容量可能从10Mbps大幅波动至65Mbps;
  2. RTT显著变化:进一步的traceroute调查表明,端主机与网关之间的高延迟抖动是罪魁祸首;
  3. 突发性数据包丢失:即使在低于网络容量上限的数据速率(按常理应该安全运行)下,端到端流量在传输过程中仍可能遭遇不可预测的突发性丢包。

CCA Performance under LEO Mobility

Next, we run experiments to measure the achievable performance of several kinds of representative end-to-end CCAs in our vantage points described in §3.1:

(i) loss-based CCAs, Reno and Cubic [23], which use packet losses as the signal for adjusting data sending rate;

(ii) delay-based CCAs, Vegas [13] and Copa [11], which exploit measured delay to estimate network congestion and adjust sending rate;

(iii) model-based CCAs, Google BBRv1/v3 [6, 14], which frequently measure the bottleneck bandwidth and minimal RTT to model the bandwidth-delay product (BDP) of the path, and accordingly regulates sending rate;

(iv) learning-based CCA, PCC-VIVACE [17], which can automatically adapt itself to various conditions based on a utility function without manually tuning;

and (v) Sprout [49] and Verus [53] that are specifically designed for conventional cellular networks with rapidly changing link capacity as well.

We notice that the live Starlink network is an un-reproducible environment, and the link capacity allocated to the terminal may change drastically in different hours of the day [20]. Hence, to alleviate the impact of uncontrolled environmental factors as much as possible, all experiments are conducted in similar off-peak time periods of the day and under similar weather conditions. For each CCA, we run more than thirty 2-min tests to obtain the statistical results.

接下来,我们在§3.1中描述的观测点上运行实验,测量几类具有代表性的端到端拥塞控制算法(CCAs)的可达性能:

  1. 基于丢包的CCAs:Reno和Cubic ,使用数据包丢失作为调整发送速率的信号;
  2. 基于延迟的CCAs:Vegas 和Copa ,通过测量延迟来估计网络拥塞并调整发送速率;
  3. 基于模型的CCAs:Google BBRv1/v3 [6, 14],通过频繁测量瓶颈带宽和最小RTT来建模路径的带宽-延迟积(BDP),并据此调节发送速率;
  4. 基于学习的CCA:PCC-VIVACE ,能够基于效用函数自动适应各种条件,无需手动调参;
  5. 专为传统蜂窝网络设计的CCAs:Sprout 和Verus ,针对链路容量快速变化的场景。

我们注意到,实时Starlink网络是一个不可重复的实验环境,分配给终端的链路容量可能在一天中的不同时间段发生显著变化 。因此,为尽可能减轻不可控环境因素的影响,所有实验均在一天中相似的非高峰时段以及相似的天气条件下进行。对于每种CCA,我们运行了超过30次2分钟的测试以获得统计结果。

alt text

Observations in live Starlink. Figure 3 plots the average, 90th and 95th-percentile RTT against the average throughput achieved by various CCAs in Starlink uplink measured in MD. Results measured in NJ are similar and are omitted due to the page limit. Recalling the basic network characteristics in Figure 2, we observe that in live Starlink, existing CCAs either overshoot the link capacity and suffer from high RTT amplification (e.g., Verus, VIVACE and BBRv1) as compared to non-congestion scenarios, or send data too conservatively, leading to different degrees of throughput degradation (e.g., Cubic, Vegas, BBRv3, Sprout and Copa).

实时Starlink网络中的观察结果

图3展示了在马德里(MD)测量的Starlink上行链路中,不同CCAs在平均吞吐量下对应的平均RTT、90百分位RTT和95百分位RTT。由于页面限制,新泽西(NJ)的结果与MD类似,因此省略。

回顾图2中的基本网络特性,我们观察到,在实时Starlink网络中,现有CCAs表现出以下问题:

  1. 超出链路容量:例如Verus、VIVACE和BBRv1,与非拥塞场景相比,这些算法会导致高RTT放大;
  2. 发送过于保守:例如Cubic、Vegas、BBRv3、Sprout和Copa,这些算法导致不同程度的吞吐量下降。

这些观察结果表明,在动态变化的Starlink网络环境中,现有CCAs在性能上存在显著局限性。

Figure 3 分析过程
  1. 超出链路容量:例如Verus、VIVACE和BBRv1,与非拥塞场景相比,这些算法会导致高RTT放大;
  2. 发送过于保守:例如Cubic、Vegas、BBRv3、Sprout和Copa,这些算法导致不同程度的吞吐量下降。

这俩货咋分析出来的?我举个例子

1) 超出链路容量

我们注意每个类别达到自己max throughput的过程

Verus: 平均RTT 接近100 ms,而 90百分位 和 95百分位RTT 分别达到120 ms和190 ms。这表明Verus在追求更高吞吐量的同时,导致了链路拥塞,显著放大了延迟。

BBRv1: 其平均RTT为60 ms,但90百分位和95百分位RTT分别上升至70-80 ms。这表明BBRv1在某些情况下也会推动链路接近拥塞状态。

2) 发送过于保守

我们注意throughput和RTT之间的关系,按理说,RTT越小,越不拥塞,吞吐率应该越高✅

Cubic、Vegas、Reno: RTT连50都不到,按常理应该“一路畅通”, 但是平均吞吐量仅约为10 Mbps都不到!

Understanding the Root Causes

Analysis methodology. To uncover the root causes behind the performance degradation, in addition to analyzing the network traces of CCAs captured in live Starlink, we build a controlled and reproducible network environment by combining: (i) tcpdump traces captured from the live Starlink, and (ii) a trace-driven, reproducible network emulation built by extending a research-grade trace-replayer [37]. We extract the network characteristics from our tcpdump traces, based on which we build the fully controlled test environment that can replay real Starlink network conditions with drastic variations. This reproducible environment enables us to inspect the root causes of performance degradation of various CCAs, described as follows.

分析方法论

为了揭示性能下降的根本原因,除了分析在实时Starlink网络中捕获的CCAs网络跟踪数据外,我们还通过以下方式构建了一个受控且可重复的网络环境:

  1. 实时Starlink的tcpdump跟踪数据:从实时Starlink网络中捕获的tcpdump数据。
  2. 基于跟踪驱动的可重复网络仿真:通过扩展一个研究级的跟踪回放器 ,构建了一个基于真实跟踪数据的可重复网络仿真环境。

我们从tcpdump跟踪数据中提取网络特性,并以此为基础构建一个完全受控的测试环境,该环境能够重现Starlink网络条件下的剧烈变化。这种可重复的仿真环境使我们能够深入检查各种CCAs性能下降的根本原因,具体分析如下所述。

tcpdump

tcpdump 是一个强大的命令行工具,用于捕获和分析网络流量。它可以实时显示网络接口上传输的数据包,或将其保存到文件以供后续分析。

tcpdump 常用于网络管理员和安全专业人员进行故障排除、性能监控及异常活动检测。

Loss-based CCAs: End-to-end connections experience noncongestion packet losses over the unstable LEO satellite links. It is a well-known limitation that Cubic and Reno can not discriminate such non-congestion packet losses. As a result, they mistakenly think network is congested and shrink their congestion window conservatively when non-congestion packet losses occur, causing self-limited throughput.

基于丢包的拥塞控制算法

端到端连接在不稳定的低轨卫星(LEO)链路中会经历非拥塞性丢包。这是因为LEO卫星的高动态特性(例如卫星切换)会导致数据包丢失,而这些丢失并非由网络拥塞引起。然而,基于丢包的拥塞控制算法(如TCP Cubic和TCP Reno)无法区分这些非拥塞性丢包。

Note
  • 误判为拥塞:Cubic和Reno将任何数据包丢失都视为网络拥塞的信号。
  • 保守行为:在检测到数据包丢失后,这些算法会缩小其拥塞窗口(CWND),以减少未确认数据的数量,从而降低发送速率。
  • 自限吞吐量:由于这些算法无法识别非拥塞性丢包,它们会在非必要情况下缩小传输速率,导致吞吐量下降。

查阅了有关资料,具体参数调整是这样的:

  1. TCP Reno:在发生数据包丢失时,将拥塞窗口减半(CWND = CWND × 0.5)。
  2. TCP Cubic:相较于Reno更激进,但仍会将窗口缩小30%(CWND = CWND × 0.7)。尽管Cubic在高带宽高延迟网络中表现更好,但其对LEO环境中的非拥塞性丢包同样敏感。

Delay-based CCAs. Delay-based CCAs rely on a basic assumption that the increase in RTT observed by the sender may reflect queuing at the bottleneck link. However, recalling the non-congestion delay jitter observed in Figure 2b, we observe that delay-based CCAs can be seriously misled in LSNs because it is difficult for them to distinguish whether the observed RTT changes are caused by congestive queuing or by path fluctuations due to LEO mobility. Specifically, Vegas detects congestion by increasing RTTs, and we observe that Vegas is frequently misled by these non-congestion RTT increases in LSNs, resulting in severe throughput degradation. Similarly, Copa is a recent delay-based CCA that converges on a target sending rate \(1 / (\sigma \cdot d_q)\) where \(d_q\) is the measured queuing delay and \(\sigma\) is a constant. Copa adjusts the congestion window in the direction of this target rate, and estimates the queuing delay as \(d_q = RTT_{standing} - RTT_{min}\), where \(RTT_{standing}\) is the smallest RTT observed over a recent short time-window and \(RTT_{min}\) is the smallest RTT observed over a long period of time (e.g., 10 seconds). We find that as the environmental RTT fluctuates frequently and drastically, Copa usually overestimates \(d_q\) and then limits its sending rate.

Figure 4 plots a concrete example illustrating the self-constrained rate adaptation observed in Copa. When the environmental RTT suddenly increases to a new level, although Copa’s \(RTT_{standing}\) estimation can be updated in time, it still takes a long time for Copa’s \(RTT_{min}\) estimation to converge to the correct value. Therefore, as the environmental RTT changes drastically, Copa frequently underestimates \(RTT_{min}\), and then overestimates \(d_q\), which is \(RTT_{standing} - RTT_{min}\). As a result, Copa mistakenly infers that there is congestion in the network and limits its sending rate. In the experiment of Figure 4, Copa only achieves about 20% link utilization and 6.84 Mbps throughput on average.

基于时延的拥塞控制算法

基于时延的拥塞控制算法依赖于一个基本假设,即发送端观察到的RTT(往返时延)的增加可能反映了瓶颈链路上的排队现象。

然而,回顾图2b中观察到的非拥塞性时延抖动,我们发现,在低轨卫星网络(LSNs)中,基于时延的拥塞控制算法可能会被严重误导。这是 因为它们难以区分观察到的RTT变化究竟是由拥塞性排队引起的,还是由低轨卫星移动性导致的路径波动所致

具体来说:

  • Vegas:Vegas通过检测RTT的增加来判断拥塞。然而,我们观察到,在LSNs中,Vegas经常被这些非拥塞性RTT增加误导,从而导致严重的吞吐量下降。
  • Copa:Copa是一种较新的基于时延的拥塞控制算法,其目标发送速率为 \(1 / (\sigma \cdot d_q)\),其中 \(d_q\) 是测量的排队时延,\(\sigma\) 是一个常数。Copa通过调整拥塞窗口(CWND)向目标速率靠拢,并将排队时延估算为: $$ d_q = RTT_{standing} - RTT_{min} $$ 其中,\(RTT_{standing}\) 是最近短时间窗口内观察到的最小RTT,而 \(RTT_{min}\) 是较长时间(例如10秒)内观察到的最小RTT。

我们发现,当环境RTT频繁且剧烈波动时,Copa通常会高估 \(d_q\) 并因此限制其发送速率。这种现象主要源于以下两点:

  1. 当环境RTT突然升高至新水平时,尽管Copa能够及时更新其 \(RTT_{standing}\) 的估计值,但需要较长时间才能使 \(RTT_{min}\) 的估计值收敛到正确值。
  2. 在这种情况下,Copa经常低估 \(RTT_{min}\),进而高估 \(d_q = RTT_{standing} - RTT_{min}\)

alt text

图4展示了一个具体示例,说明了Copa在自适应速率调整中的自我约束行为。当环境RTT剧烈变化时,Copa错误地推断网络发生了拥塞,并因此限制其发送速率。在图4的实验中,Copa仅实现了约20%的链路利用率,平均吞吐量为6.84 Mbps。

Model-based CCAs. BBR frequently probes the network for its propagation RTT (pRTT) and bottleneck bandwidth (bBW), and then adjusts sending rate to match the bandwidth-delay product (BDP). Figure 5 plots the time-varying queuing delay, packet loss rate and throughput achieved by different versions of BBR in our controlled, reproducible environment. We identify several issues in BBR.

BBRv1 experiences bBW overestimation and pRTT underestimation under the drastic network variations caused by LEO mobility. First, BBRv1 estimates bBW by the maximum delivery rate (\(deliveryRate\)) is calculated over a 10-RTT window. When the link capacity fluctuates drastically, such a maximum filter always over-estimates \(bBW\). Note that BBRv1’s sending rate is set by the estimated \(bBW\) multiplied by a factor called \(pacing_gain = {1.25, 0.75, 1, 1, 1, 1, 1}\), and the data in flight is capped by \(cwnd = 2 \times BDP\). When the link capacity fluctuates, because \(bBW\) is overestimated, BBRv1 overshoots the link capacity until the data in flight reaches \(2 \times BDP\), resulting in high queuing delay, especially when the link capacity significantly slumps.

Second, BBRv1 estimates \(pRTT\) by the minimum observed RTT over a 10-second window. Thus, when the RTT increases due to LEO mobility rather than congestion, BBRv1 under-estimates \(pRTT\). However, because \(bBW\) is overestimated most of the time, while \(pRTT\) is underestimated much less often, in our experiments we observe that in most cases the BDP is still overestimated.

alt text

基于模型的拥塞控制算法

BBR通过频繁探测网络的传播RTT(\(pRTT\))和瓶颈带宽(\(bBW\)),并根据带宽-时延积(BDP)调整发送速率以匹配网络容量。图5展示了在我们构建的受控、可重复环境中,不同版本的BBR在时变排队延迟、丢包率和吞吐量方面的表现。我们识别出BBR存在的多个问题。

  1. 瓶颈带宽(\(bBW\))的高估
    在低轨卫星(LEO)移动性引起的剧烈网络变化下,BBRv1会高估\(bBW\)。具体而言,BBRv1通过最大传输速率(\(deliveryRate\))计算\(bBW\),该值基于一个10-RTT窗口内的最大值。然而,当链路容量剧烈波动时,这种最大值过滤器会始终高估\(bBW\)

    • 发送速率设置:BBRv1的发送速率由估计的\(bBW\)乘以一个称为\(pacing\_gain = \{1.25, 0.75, 1, 1, 1, 1, 1\}\)的因子确定,同时飞行中的数据量被限制为\(cwnd = 2 \times BDP\)
    • 问题表现:当链路容量波动时,由于\(bBW\)被高估,BBRv1会使发送速率超过链路实际容量,直到飞行中的数据量达到\(2 \times BDP\)。这会导致较高的排队延迟,尤其是在链路容量显著下降时。
  2. 传播RTT(\(pRTT\))的低估
    BBRv1通过一个10秒窗口内观察到的最小RTT来估算\(pRTT\)。然而,当RTT因LEO移动性而增加,而非由于拥塞引起时,BBRv1会低估\(pRTT\)

    • 实验观察:尽管大多数情况下\(bBW\)被高估,而\(pRTT\)被低估的情况较少,但实验表明,在大多数情况下,BDP仍然被高估。

BBRv3 has made several modifications upon BBRv1. One important aspect is that BBRv3 estimates \(bBW\) as the minimum value of two new parameters: \(bw_high\) and \(bw_low\). Specifically:

\(bw_high\) is calculated by the maximum delivery rate over a short window.

\(bw_low\) is set to an extremely high value when there is no packet loss but is set to \(\max(latest_deliveryRate, 0.7 \times bw_high)\) if packet loss rate > 0.

In other words, BBRv3 suppresses the sending rate in case of packet loss. The original intention of this change is that when packet loss occurs, it may indicate congestion, so BBRv3 should reduce the sending rate. However, in our experiments, we observe that due to random packet losses in LEO satellite links, BBRv3 avoids overshooting the link but is less resilient to non-congestion loss as compared to BBRv1. As a result, BBRv3 can only achieve less than 65% link utilization on average under Starlink lossy links.

BBRv3 在 BBRv1 的基础上进行了多项改进,其中一个重要方面是对瓶颈带宽(\(bBW\))的估算方法。BBRv3 将 \(bBW\) 定义为两个新参数 \(bw_high\)\(bw_low\) 的最小值。具体如下:

\(bw_high\):通过短时间窗口内的最大传输速率计算得出。

\(bw_low\):在没有数据包丢失的情况下被设置为一个极高的值;但如果丢包率大于 0,则被设置为 \(\max(latest_deliveryRate, 0.7 \times bw_high)\)

换句话说,BBRv3 在检测到数据包丢失时会抑制发送速率。这一改动的初衷是,当发生数据包丢失时,可能表明网络拥塞,因此 BBRv3 应该降低发送速率。

然而,在实验中我们发现,由于低轨卫星(LEO)链路中存在随机数据包丢失,BBRv3 虽然能够避免发送速率超过链路容量,但相比于 BBRv1,其对非拥塞性丢包的适应性较差。因此,在 Starlink 存在丢包的链路环境下,BBRv3 的链路利用率平均不到 65%。

Learning-based CCAs. Recent CCAs like VIVACE try to learn from the observed network conditions based on a utility function and accordingly estimate a proper sending rate. Specifically:

VIVACE’s utility function is calculated based on the sending rate contribution, latency penalty (calculated by RTT gradient), and loss penalty in each measurement interval.

We observe two performance issues in VIVACE:

Non-congestion RTT increases caused by LEO mobility can amplify latency penalty and result in inaccurate utility estimation. VIVACE incorporates a dynamic change boundary \(\omega\) to limit the rate change in a certain range.

The original intention of 𝜔 is to avoid drastic rate change that overshoots the link capacity [17], but such a boundary also leads to slow rate convergence in Starlink as the link capacity changes rapidly. As a result, VIVACE: (i) under-utilizes the network when the link capacity drastically increases or when the propagation RTT suddenly increases due to LEO mobility; and (ii) overshoots the network when the link capacity drastically decreases, causing high queuing delay.

基于学习的拥塞控制算法

近年来,诸如 VIVACE 等基于学习的拥塞控制算法尝试通过观察网络条件并基于效用函数(utility function)进行学习,从而估算出合适的发送速率。具体而言:

  • 效用函数的计算:VIVACE 的效用函数基于以下三个因素在每个测量时间间隔内进行计算:
    1. 发送速率贡献(sending rate contribution);
    2. 时延惩罚(latency penalty),通过 RTT 梯度计算;
    3. 丢包惩罚(loss penalty)。

我们在 VIVACE 中观察到以下两个主要性能问题:

  1. 非拥塞性 RTT 增加导致的效用估算不准确
    • LEO 卫星的移动性可能引起非拥塞性 RTT 的增加,这会放大时延惩罚,从而导致效用估算不准确。
  2. 动态变化边界 \(\omega\) 的限制
  3. VIVACE 引入了一个动态变化边界 \(\omega\),用于将速率变化限制在一定范围内。\(\omega\) 的初衷是为了避免速率变化过快而超过链路容量。
  4. 然而,在 Starlink 网络中,由于链路容量变化迅速,这一边界也导致了速率收敛过慢。

换句话说:

  • 当链路容量快速增加或由于 LEO 移动性导致传播 RTT 突然增加时,VIVACE 未能充分利用网络资源,表现为网络利用率不足
  • 当链路容量快速下降时,VIVACE 的发送速率超过了链路容量,导致排队延迟显著增加

VIVACE 的设计初衷是通过动态调整发送速率实现高效的网络利用。然而,在 LEO 卫星网络中,由于其对非拥塞性 RTT 增加的敏感性以及动态边界限制所带来的速率调整滞后性,该算法在快速变化的网络条件下表现出显著局限性。这导致了吞吐量不足和高延迟等问题。

The fundamental challenge. Summarily, we find that it is quite challenging for existing CCAs to detect network congestion promptly and accurately in an LSN with drastic, multi-dimensional network variations induced by LEO mobility. Essentially, every CCA relies on certain network models and assumptions, based on which the CCA infers network conditions and whether congestion occurs. However, these fundamental assumptions they used become inaccurate in emerging LSNs.

总结来看,我们发现现有的拥塞控制算法(CCAs)在低轨卫星网络(LSNs)中难以快速且准确地检测网络拥塞。这是由于低轨卫星的移动性引发了剧烈且多维的网络变化。实际上,每种拥塞控制算法都依赖于某些网络模型和假设,并基于这些假设推断网络状态及是否发生拥塞。然而,在新兴的LSNs中,这些基本假设变得不再准确。

Link capacity, RTT and loss rate can change frequently and drastically in LSNs, mixing both congestion and non-congestion variations, and existing CCAs can easily be misled by these non-congestion signals. Considering the fundamental challenge is that it is hard for end-to-end CCAs to discriminate whether the observed performance changes are caused by congestion or not, we argue that CCAs in LSNs require some effective indicators which can implicitly help end host discriminate non-congestion performance changes.

  1. 网络特性频繁且剧烈变化
    在LSNs中,链路容量、RTT(往返时延)和丢包率会频繁且显著地变化。这些变化既可能由拥塞引起,也可能由非拥塞因素(如路径波动或链路切换)导致。现有的CCAs很容易被这些非拥塞信号误导。

  2. 难以区分拥塞与非拥塞变化
    现有的端到端CCAs主要通过观察发送端的网络性能变化来推断拥塞。然而,它们无法有效区分这些变化是否确实由拥塞事件(如瓶颈链路上的排队)引起。例如:

    • 丢包可能是由于低轨卫星切换而非拥塞导致。
    • RTT的增加可能是路径变化而非瓶颈排队的结果。

考虑到根本性挑战在于难以区分拥塞与非拥塞性能变化,我们认为LSNs中的CCAs需要一些有效的指标来帮助终端主机隐式地辨别非拥塞性能变化。例如:

  • 利用网络侧显式通知机制,将因卫星切换导致的丢包或因路径变化导致的延迟增加直接告知发送端。
  • 设计能够适应动态网络环境的新型算法,以更精准地估算网络状态。
Note

这一部分涉及到具体对各类CC的影响(移动性 -> 参数 -> CWND / ...)

第一遍其实不用太细看,我们只需要抓住主体脉络:

  1. 在LSNs中,链路容量、RTT(往返时延)和丢包率会频繁且显著地变化。这些变化既可能由拥塞引起,也可能由非拥塞因素(如路径波动或链路切换)导致
  2. 现有的CCAs很容易被这些非拥塞信号误导 原因:
    • 现有的端到端CCAs主要通过观察发送端的网络性能变化来推断拥塞
    • 而它们无法有效区分这些变化是否确实由拥塞事件引起

原因都已经找到了,那么应该如何做呢?

  1. 显式通知机制: 将因卫星切换导致的丢包或因路径变化导致的延迟(“非拥塞延迟”)增加直接告知发送端
  2. 设计能够适应动态网络环境的新型算法
    • 这很底层,有点打破机制另起炉灶的意味😄