INTRODUCTION¶
Recently we have witnessed an unprecedented surge of multimedia-based real-time communication (RTC) applications in usage (e.g., Zoom, GoogleMeet, etc.), especially during the COVID-19 pandemic. With their increasing importance and popularity, RTC applications enable long-distance, widearea communications, involving a number of geo-distributed international users to collaborate and interact with each other.
One important performance metric for RTC services is the communication latency which indicates how fast the media content can be delivered among all attendees. Specifically, the one-way communication latency suggested for acceptable RTC experience ranges from 150ms (e.g., Internet-based video calls) [12] to 20ms (e.g., interactive VR/AR) [31], depending on the concrete application type. To reduce the communication latency, existing RTC architectures are typically built upon geo-distributed cloud platforms, exploiting a collection of cloudbased relay servers to construct an overlay network [32], [34], [50] upon the public Internet, which can enable better scalability when the number of users increases, lower loss rate, reduced latency and significantly higher bandwidth [15], [29]. Moreover, cloud relays also enable the usage of storage and computation capabilities for inserting in-network ability, i.e., caching and network coding for realizing timely packet delivery [28].
However, through a measurement study associated with a large number of RTC sessions covering 193 countries/regions, we observe that there are still a large portion of users suffering from high communication latency (e.g., > 500ms) on wide-area sessions. On our further analysis, we find that the meandering routes over the client-cloud segment and the inter-cloud-site segment in existing cloud-based RTC architecture are critical culprits for the high latency issue suffered by wide-area RTC sessions: (i) clients still connect to the cloud relays over public Internet routes, which may cross multiple autonomous systems (ASes) and cause meandering paths due to underlying latency sub-optimal routing protocols; (ii) while inter-cloud routes are built upon private cloud WAN within a single administrative domain, the insufficient cloud deployments can also create prolonged paths with additional latency. None of above culprits are easy to tackle in terrestrial networks: cross-AS routes are decided by policies from multiple independent AS operators, and the deployment of cloud servers is ultimately limited by economic and geographic factors (e.g., it is difficult to deploy upon oceans or mountains). How might we alleviate the above root causes of poor latency performance in wide-area RTC?
Emerging low Earth orbit (LEO) mega-constellations (e.g., Starlink [45], Kuiper [5]) seem to be an attractive revolution for low-latency global communication. Future satellites will be equipped with evolved space-borne computation systems [25], [42] and free-space laser inter-satellite links (ISL) supporting high speed data communication [30]. Therefore, upcoming mega-constellations are promising to realize low-latency, high-throughput data transfer [26], [27], [37], and even cloud-like capabilities [17], [22] globally.
近年来,基于多媒体的实时通信(RTC)应用(如Zoom、Google Meet等)使用量激增,尤其是在COVID-19大流行期间。随着这些应用的重要性和普及,它们使得跨地域、广域的通信成为可能,涉及大量地理分布的国际用户相互协作与互动。
RTC服务的一个重要性能指标是通信延迟,它指示了媒体内容在所有参与者之间传输的速度。具体而言,针对可接受的RTC体验,单向通信延迟的建议范围从150毫秒(如基于互联网的视频通话)[12]到20毫秒(如交互式VR/AR)[31],具体取决于应用类型。为了减少通信延迟,现有的RTC架构通常建立在地理分布的云平台上,利用一系列基于云的中继服务器构建叠加网络[32],[34],[50],通过公共互联网实现,这样可以在用户数量增加时提供更好的可扩展性、较低的丢包率、减少的延迟以及显著更高的带宽[15],[29]。此外,云中继还使得存储和计算能力的使用成为可能,用于实现网络中的缓存和网络编码等及时的数据包传输功能[28]。
然而,通过对涵盖193个国家/地区的大量RTC会话进行测量研究,我们观察到,仍然有大量用户在广域会话中遭遇较高的通信延迟(例如,>500毫秒)。在进一步分析中,我们发现现有基于云的RTC架构中, 客户端-云段和云站点间段的曲折传输路径是导致广域RTC会话高延迟 问题的关键原因:
(i)客户端仍然通过公共互联网路线连接到云中继,这些路线可能 跨越多个自治系统(AS)并由于底层延迟的次优路由协议造成曲折的路径 ;
(ii)尽管云间路线是在单一管理域内通过私有云广域网(WAN)建立的,但 云站点的不足部署也可能导致路径过长 并增加额外的延迟。
上述问题在地面网络中难以解决:跨AS的路由由多个独立的AS运营商的策略决定,而云服务器的部署最终受限于经济和地理因素(例如,在海洋或山脉等区域的部署困难)。我们如何缓解上述广域RTC延迟性能不佳的根本原因?
新兴的低地球轨道(LEO)巨型卫星星座(如Starlink [45]、Kuiper [5])似乎为低延迟全球通信带来了革命性的新机会。未来的卫星将配备先进的空间计算系统[25],[42]以及支持高速数据通信的自由空间激光卫星间链路(ISL)[30]。因此,未来的卫星星座有望实现低延迟、高吞吐量的数据传输[26],[27],[37],甚至提供类似云计算的能力[17],[22],从而实现全球通信。
Inspired by the opportunities above, we propose an RTC framework called SPACE RTC to explore a futuristic yet important question: how could emerging mega-constellations help to improve the latency performance for wide-area RTC?
In particular, different from prior cloud-based RTC solutions, SPACE RTC is a cooperative framework that enhances widearea RTC services by: (i) collaboratively constructing a hybrid satellite-cloud network upon terrestrial cloud sites and LEO constellations, and (ii) judiciously allocating RTC flows on low-latency cloud- or satellite-paths. Specifically, SPACE RTC formulates the dynamic RTC Latency Minimization (RLM) problem under the integrated space-ground environment, which aims at minimizing the average communication latency in each RTC session, under constraints like dynamic satellite visibility, bandwidth capacity and amount of inter-satellite or ground communication links, etc.
Solving the RLM problem requires to select proper cloud and satellite relays to form low-latency paths for each RTC session, and wisely allocate media flows on those paths to avoid link congestion. However, the inherent high-dynamicity of LEO satellites imposes significant challenges on both relay selection and flow allocation. Satellites are moving in high-speed in LEO, and thus the visibility as well as the connectivity between satellites and ground units (e.g., a ground station or a user terminal) are changing over time. The network topology of the space segment might also fluctuate periodically, leading to frequent link state change and requiring flow re-allocation on new paths. Existing relay selection schemes [28], [32] for static cloud servers can not handle such topological dynamics and fluctuations, and thus are unlikely to be applicable in the hybrid network. To tackle the above challenges, SPACE RTC incorporates a cooperative relay selection algorithm together with a low-latency flow allocation algorithm to adaptively and efficiently choose proper relays to build close-to-optimal paths for each session, and dynamically allocate flows on appropriate network links in each short-term space-ground snapshot.
受到上述机会的启发,我们提出了一种RTC框架 —— SPACE RTC,旨在探索一个具有前瞻性且重要的问题:新兴的卫星星座如何帮助改善广域RTC的延迟性能?
具体来说,与先前的基于云的RTC解决方案不同,SPACE RTC是一个协同框架,通过以下方式增强广域RTC服务:
(i)在 地面云站点和LEO星座之间协同构建混合 卫星-云 网络
(ii)在低延迟的 云路径或卫星路径上智能地分配RTC流量
具体而言,SPACE RTC在集成的空间-地面环境下,针对动态RTC延迟最小化(RLM)问题进行建模,旨在在每个RTC会话中最小化平均通信延迟,并在动态卫星可见性、带宽容量以及卫星间或地面通信链路的数量等约束条件下进行优化。解决RLM问题需要为每个RTC会话选择合适的云和卫星中继,形成低延迟路径,并明智地在这些路径上分配媒体流量,以避免链路拥塞。
然而, LEO卫星的固有高动态性给中继选择和流量分配带来了巨大挑战 。卫星在LEO轨道上高速运动,因此卫星与地面单元(如地面站或用户终端)之间的可见性和连接性随时间变化。空间段的网络拓扑也可能周期性波动,导致频繁的链路状态变化,进而需要在新的路径上重新分配流量。现有的静态云服务器中继选择方案[28],[32]无法处理这种拓扑动态变化,因此不适用于混合网络。为了应对这些挑战,SPACE RTC结合了协同中继选择算法和低延迟流量分配算法,以适应性和高效的方式选择合适的中继,为每个会话构建接近最优的路径,并在每个短期的空间-地面快照中动态分配流量。
To evaluate the performance of SPACE RTC, we build a simulation testbed to emulate the satellite-cloud hybrid network and implement the SPACE RTC prototype. We exploit public information from real constellations and cloud platforms to construct the dynamic network topology, and mimic widearea sessions based on the RTC trace we collected. Extensive evaluations demonstrate that by judiciously exploiting LEO satellites to assist existing cloud-based RTC architecture, the communication latency can be significantly reduced by up to 64.9%, and by 40.4% on average, for wide-area RTC sessions.
为了评估SPACE RTC的性能,我们构建了一个仿真测试平台来模拟卫星-云混合网络,并实现了SPACE RTC原型。我们利用来自真实卫星星座和云平台的公开信息来构建动态网络拓扑,并基于我们收集的RTC流量数据模拟广域会话。大量评估结果表明,通过智能地利用LEO卫星辅助现有的基于云的RTC架构,通信延迟可以显著降低,广域RTC会话的延迟最高可减少64.9%,平均减少40.4%。
Summarily, this paper makes the following contributions.
• (i) Through a measurement study we quantitatively expose the high latency issue in state-of-the-art RTC applications and reveal the root causes (§II).
• (ii) Analyzing the feasibility, benefits and challenges of exploiting LEO satellites to assist RTC on a global scale, and proposing SPACE RTC, a satellite-cloud cooperative framework that judiciously selects cloud/satellite relay servers and allocates flows to attain low communication latency (§III).
• (iii) Building a space-ground simulation environment based on real constellation information, and conducting extensive experiments to demonstrate that SPACE RTC can deliver near-optimal latency performance (§IV).
总结来说,本文作出了以下贡献:
• (i)通过测量研究,定量揭示了当前RTC应用中的高延迟问题,并揭示了其根本原因(§II)
• (ii)分析了在全球范围内利用LEO卫星辅助RTC的可行性、利益和挑战,并提出了SPACE RTC,一个卫星-云协同框架,能够智能地选择云/卫星中继服务器并分配流量,从而实现低通信延迟(§III)
• (iii)基于真实的卫星星座信息构建了空间-地面仿真环境,并进行大量实验,证明了SPACE RTC能够提供接近最优的延迟性能(§IV)
TL;DR
1) 现有云服务器传输没有带来理想的RTC体验
- 跨AS,用BGP,并不是基于最短路径
- 云站点的部署不足,导致路径过长
2) SpaceRTC: 地面云 和 卫星云 协同
优势:
- 地面云站点和LEO星座之间协同构建混合
- 在云/卫星群中智能分配RTC流量
带来的挑战 (LEO固有的高动态性):
- LEO-GS / LEO-LEO 路径动态变化 -> “路径选择”很头大 😅
- 同理 -> “重新分配流量”很头大 🤕
相较于Spache and StarFront
Spache / StarFront 侧重于: 将云服务器和卫星服务器作为 数据存储中心,进行缓存管理
SpaceRTC 侧重于在这类背景下的: 路径选择 + 重新分配流量