BACKGROUND AND MOTIVATION¶
Quick primer for wide-area content distribution. In particular, a CDN usually consists of a source server which generates the original content, together with a number of distributed cache servers that are transparent to users. Content providers can leverage public cloud platforms (e.g., Amazon AWS, Microsoft Azure) to build a CDN and deploy their contents upon cache servers close to end users. In general, the content distribution process includes three key steps: (i) pushing the original contents from the source server to a collection of geo-distributed cache servers (e.g., through building a distribution tree [27]); (ii) configuring the region↔server map which at runtime determines how user requests from different regions should be assigned to a proper cache server (e.g., the server closest to the user); and (iii) updating data on each cache server, if there are new contents available from the source server. The above operations are charged primarily based on the storage and bandwidth resources they consumed, following the concrete pricing policies specified by different cloud providers (e.g., [1], [3]). Ideally, by moving contents close to users, wide-area CDNs are expected to pervasively enable low content access latency, which is defined as the time consumption of delivering one or a batch of required object(s) to end users.
内容分发网络(CDN)通常由生成原始内容的源服务器以及对用户透明的多个分布式缓存服务器组成。内容提供商可以利用公共云平台(例如,Amazon AWS、Microsoft Azure)来构建 CDN,并将他们的内容部署在靠近最终用户的缓存服务器上。
通常,内容分发过程包括三个关键步骤:
(i)将原始内容从源服务器推送到一系列地理分布的缓存服务器(例如,通过构建分发树);
(ii )配置区域 ↔ 服务器映射,该映射在运行时确定如何将来自不同区域的用户请求分配给适当的缓存服务器(例如,最靠近用户的服务器);
(iii)如果源服务器有新内容可用,则更新每个缓存服务器上的数据。
上述操作主要根据它们消耗的存储和带宽资源收费,并遵循不同云提供商指定的具体定价策略。理想情况下,通过将内容移动到靠近用户的位置,期望广域 CDN 能够普遍实现 低内容访问延迟,这被定义为将一个或一批所需对象传递给最终用户所消耗的时间。
Content with high access latency observed from a global perspective. To quantitatively understand the achievable performance of state-of-the-art commercial CDNs, we collect a dataset containing RTT measurements across 183 countries to their nearest cloud sites provisioned by seven of the most popular CDN platforms. The measurements were conducted during the period between March-7 and April-1 2019, using the RIPE Atlas measurement platform [12], which employs a global network of probe nodes to measure Internet connectivity and reachability. In each probe node, we execute multiple pings to measure the RTT between the node and different CDN servers, and use the lowest RTT to refrain the impact of temporary network congestion or link failure on the measurement results. Table I illustrates the measurement details in different continents in our experiment.
从全球角度观察到的具有高访问延迟的内容。为了定量地了解最先进的商业 CDN 的可实现性能,我们收集了一个数据集,其中包含 183 个国家/地区到七个最受欢迎的 CDN 平台提供的最近云站点的 RTT 测量值。测量是在 2019 年 3 月 7 日至 4 月 1 日期间进行的,使用了 RIPE Atlas 测量平台,该平台使用全球探针节点网络来测量互联网连接和可达性。在每个探针节点中,我们执行多次 ping 操作以测量节点与不同 CDN 服务器之间的 RTT,并使用最低的 RTT 来避免临时网络拥塞或链路故障对测量结果的影响。表 I 说明了我们实验中不同大陆的测量细节。
Figure 1 plots the CDF of the RTT results from different geo-distributed users to their nearest cloud server of a certain CDN operator. As shown in Figure 1, we observed that despite the advantages of global CDNs, there are still a large fraction of CDN users suffering from RTT higher than 50ms, with a long tail of up to about 300ms, even if the closest cloud server is selected. High network RTTs can significantly impair the user-perceived quality of experience (QoE), especially for time-sensitive applications which typically issue a sequence of requests in batch. For example, as shown in [25], even tens of milliseconds of additional RTT might substantially deteriorate Web browsing page load times and degrade user experience.
图 1 绘制了从不同地理分布的用户到某个 CDN 运营商的最近云服务器的 RTT 结果的 CDF。如图 1 所示,我们观察到,尽管全球 CDN 具有优势,但仍有很大一部分 CDN 用户的 RTT 高于 50 毫秒,即使选择了最近的云服务器,其长尾也高达约 300 毫秒。高网络 RTT 会显著损害用户感知的体验质量 (QoE),尤其是对于通常批量发出多个请求的时间敏感型应用程序。例如,如 [25] 所示,即使是几十毫秒的额外 RTT 也可能会大大降低 Web 浏览页面加载时间并降低用户体验。
To further understand the high RTT observations, we group the measurement result of each user by their original continents, as illustrated in Figure 2. We find that while the latency in many populated and developed areas could be low, there are still many users suffering from high access RTT, even if the nearest cloud server is selected, especially for those users in remote or under-developed areas. More specifically, there are about 53.58%/23.93% users suffering from RTT higher than 50ms/100ms in total. Even in developed regions like EU and NA, we still observe 3.4% and 4.5% users associated with RTT higher than 50ms respectively. The latency problem is more stringent in AF, where more than 86.6%/49.8% users suffer from RTT higher than 50ms/100ms.
为了进一步了解高 RTT 观察结果,我们将每个用户的测量结果按其原始大陆进行分组,如图 2 所示。我们发现,虽然许多人口稠密和发达地区的延迟可能很低,但即使选择了最近的云服务器,仍有许多用户遭受高访问 RTT 的困扰,尤其是对于那些偏远或欠发达地区的用户。更具体地说,总共有约 53.58%/23.93% 的用户遭受高于 50 毫秒/100 毫秒的 RTT。即使在欧盟和北美等发达地区,我们仍然观察到分别有 3.4% 和 4.5% 的用户与高于 50 毫秒的 RTT 相关。在非洲,延迟问题更为严重,超过 86.6%/49.8% 的用户遭受高于 50 毫秒/100 毫秒的 RTT。
For cases suffering from high RTTs, we further leverage traceroute to track and analyze the route from the end user to the assigned cache server. The traceroute results capture the per-hop information from the client to corresponding cache server. Our analysis identifies two root causes for the high latency observation: (i) the underserved network and cloud infrastructures; and (ii) meandering terrestrial routes from users to the assigned cache server. Figure 3 shows two representative examples that explain the high latency. As shown in Figure 3a, due to the insufficient cloud deployments in remote regions, for users in Chelyabinsk (a city in middle Russia), the closest available cache server is the Azure service in Warsaw (Poland). Since the fiber distance is approximately 3200km, the minimal RTT has already reached about 67ms which is physically constrained by the speed of the light in fibers.
对于遭受高 RTT 的情况,我们进一步利用 traceroute 来跟踪和分析从最终用户到分配的缓存服务器的路由。traceroute 结果捕获了从客户端到相应缓存服务器的每跳信息。我们的分析确定了 高延迟 观察的两个根本原因:
(i) 网络和云基础设施服务不足;以及 (ii) 从用户到分配的缓存服务器的蜿蜒陆地路线。
图 3 显示了两个代表性示例,解释了高延迟。如图 3a 所示,由于偏远地区云部署不足,对于车里雅宾斯克(俄罗斯中部城市)的用户,最近的可用缓存服务器是华沙(波兰)的 Azure 服务。由于光纤距离约为 3200 公里,因此最小 RTT 已经达到约 67 毫秒,这在物理上受到光纤中光速的限制。
In addition, meandering routes may further enlarge the propagation delay over the terrestrial network. Today’s Internet consists of a large number of independent autonomous systems (AS). Routes between users and their assigned cache servers may travel multiple ASes, and how packets are forwarded among ASes depends on policy-based interAS routing protocols (e.g., BGP). Due to many practical considerations beyond latency performance, routes crossing multiple ASes might not be latency-optimal. In particular, as depicted in Figure 3b, CloudFront servers located in Xining (China) are the closest cache server (∼2100km distance) for users in Almaty (Kazakhstan). However, logs captured by traceroute reveal that packets are forwarded via a meandering and prolonged route, through London and Hong Kong, resulting in a significantly enlarged RTT (∼370ms).
此外,蜿蜒的路由可能会进一步扩大陆地网络上的传播延迟。当今的互联网由大量独立的自治系统 (AS) 组成。用户与其分配的缓存服务器之间的路由可能会经过多个 AS,并且数据包如何在 AS 之间转发取决于基于策略的 AS 间路由协议(例如,BGP)。由于许多超出延迟性能的实际考虑因素,跨越多个 AS 的路由可能不是延迟最优的。特别是,如图 3b 所示,位于西宁(中国)的 CloudFront 服务器是阿拉木图(哈萨克斯坦)用户的最近缓存服务器(距离约为 2100 公里)。但是,traceroute 捕获的日志显示,数据包通过蜿蜒且延长的路由(途经伦敦和香港)转发,导致 RTT 显著增大(约为 370 毫秒)。
Note
现行陆地CDN不完美的原因:
- 有些地区CDN提供商很少,这些地区通常要走很远,时延自然很大
- 有些地区虽然附近有CDN但是无法直接连接
- 不同AS,需要遵循BGP
- 回忆一下,BGP商业协议,走不走/通不通并不是根据最短路径判定的
- 不走最短路径,大多都会绕圈绕很远,南辕北辙👍
Low-latency content access enabled by futuristic LEO mega-constellations. Low earth orbit (LEO) satellites are regaining popularity in recent years [22], [23], [31], [33], [34], [35], [36], [37], [41], [45], [51]. As compared with the first generation of satellite network that uses geostationary satellite for communication, emerging mega-constellations (e.g., Starlink [17], OneWeb [10], Kuiper [2], Boeing [8]), which plan to consist of thousands of mass-produced low-flying satellites, will be empowered with evolved network and storage capability, and thus can (likely) enable new opportunities for constructing low-latency CDNs globally.
• (i) Evolved on-board communication capability. Many planned constellations suggest the use of RF or laser inter-satellite links (ISLs) through which LEO satellite can connect to visible neighbor satellites and construct a network in space. Due to the low-flying property (i.e., 500-1200km altitude), LEO constellations also promise low-latency Internet connectivity. In addition, the speed of light in terrestrial fiber is about 33% slower than that in air or vacuum [33]. Recent studies also have outlined the vision of low-latency routing in space [34], [35], [36], [45].
• (ii) Big data stores in space. Another evolution of onboard capacity is the storage in space. Recent works [37], [38] have envisioned the satellite-based big data storage. Cloud Constellation Corporation (CCC) built by SpaceBelt [16] is a data storage service using LEO satellites. The CCC system contains a ring of 10 LEO satellites in a 650-kilometer equatorial orbit, and three of them are data stores, offering about 5PB storage capacity since December 2018 [15].
近年来,低地球轨道(LEO)卫星再次受到欢迎[22],[23],[31],[33],[34],[35],[36],[37]。与使用地球静止卫星进行通信的第一代卫星网络相比,新兴的巨型星座(例如,Starlink[17]、OneWeb[10]、Kuiper[2]、Boeing[8])计划由数千颗大规模生产的低空卫星组成,这些卫星将具有先进的网络和存储能力,因此可以(可能)为在全球范围内构建低延迟 CDN 带来新的机会。
(i) 不断发展的机载通信能力。许多计划中的星座建议使用射频或激光星间链路(ISL),通过这些链路,LEO 卫星可以连接到可见的相邻卫星并在太空中构建网络。由于低空特性(即 500-1200 公里高度),LEO 星座还有望实现低延迟互联网连接。此外,陆地光纤中的光速比空气或真空中的光速慢约 33%[33]。最近的研究也概述了空间低延迟路由的愿景[34],[35],[36]。
(ii) 太空中的大数据存储。机载容量的另一个发展是太空中的存储。最近的工作[37], 设想了基于卫星的大数据存储。由 SpaceBelt[16] 构建的 Cloud Constellation Corporation (CCC) 是一种使用 LEO 卫星的数据存储服务。CCC 系统包含一个由 10 颗 LEO 卫星组成的环,位于 650 公里的赤道轨道上,其中 3 颗是数据存储,自 2018 年 12 月以来提供约 5PB 的存储容量[15]。
To quantify the potential of reducing access latency by LEO satellites, we estimate the propagation latency from several terrestrial vantage points to their closest LEO satellite, and compare the “client-to-satellite” latency with the “client-tocloud” latency for each vantage point. Latencies to the closest cloud data center are obtained from the trace we collected in §II. Specifically, we follow the recent characterizing methods [34], [41], [45] to estimate the latency from terrestrial users to LEO satellites, and we use the orbital information of the first shell of the phase I (1584 satellites) of Starlink [17] as the constellation configuration in our experiment. Figure 4 plots the latency comparison in eight geo-distributed vantage points, which are populated cities in different continents. Theoretically LEO constellations promise to reduce up to 86% latency in remote areas (e.g., in Antananarivo, Madagascar) where terrestrial cloud infrastructures are limited.
为了量化 LEO 卫星减少访问延迟的潜力,我们估计了从几个地面观测点到其最近的 LEO 卫星的传播延迟,并将每个观测点的“客户端到卫星”延迟与“客户端到云”延迟进行比较。到最近的云数据中心的延迟是从我们在 §II 中收集的跟踪数据中获得的。具体来说,我们遵循最近的表征方法[34], 来估计从地面用户到 LEO 卫星的延迟,并且我们使用 Starlink[17] 的第一阶段第一批(1584 颗卫星)的轨道信息作为我们实验中的星座配置。图 4 绘制了八个地理分布的观测点(它们是不同大陆人口稠密的城市)的延迟比较。从理论上讲,LEO 星座有望在地面云基础设施有限的偏远地区(例如,马达加斯加的塔那那利佛)减少高达 86% 的延迟。
The above evolution from the space industry thus plots a promising picture which outlines a satellite-cloud cooperative architecture that can (potentially) improve the accessibility and network performance of existing CDNs. Intuitively, LEO satellites can assist current cloud-based CDNs via: (i) constructing low-latency, close-to-optimal space paths connecting terrestrial clouds and end users to avoid meandering fiber routes which might prolong the access latency; and (ii) enabling a new paradigm “LEO satellite cache” that provides direct content access for regions where even the nearest cloud site is still too far away. While in practice the demand and supply of cache services evolve hand in hand, and caches are likely to appear where demand for content grows [24], we argue that integrating satellite caches and terrestrial cloud-based caches is meaningful for content providers due to two reasons. First, as shown previously, there are still a large number of terrestrial users suffering from high access latency issue in existing terrestrial CDNs. Second, the global Internet penetration rate is about 63% as of April 2022. Exploiting the integration of satellite and cloud caches not only can improve the network performance for existing CDNs, but also can expand the service coverage and market size for content providers.
上述航天工业的发展因此描绘了一个充满希望的景象,概述了一种卫星-云协同架构,该架构可以(潜在地)提高现有 CDN 的可访问性和网络性能。直观地,LEO 卫星可以通过以下方式协助当前的基于云的 CDN:
(i) 构建连接地面云和最终用户的低延迟、接近最佳的空间路径 ,以避免可能延长访问延迟的蜿蜒光纤路由;
(ii) 启用一种新的范例“LEO 卫星缓存” ,为即使最近的云站点仍然太远的地区提供直接的内容访问。
虽然在实践中,缓存服务的需求和供应是携手发展的,并且缓存在内容需求增长的地方很可能出现,但我们认为,由于以下两个原因,集成卫星缓存和基于地面云的缓存对于内容提供商来说是有意义的。首先,如前所示,在现有的地面 CDN 中,仍然有大量的地面用户遭受高访问延迟问题。其次,截至 2022 年 4 月,全球互联网普及率约为 63%。利用卫星和云缓存的集成不仅可以提高现有 CDN 的网络性能,还可以扩大内容提供商的服务覆盖范围和市场规模。
However, while promising, we argue that constructing such a cooperative CDN upon cloud data centers and mega-constellations still faces several unsolved challenges.
• (i) Scarce and costly space resources. While evolved, network and storage resources are still relatively limited and costly in space, as compared with the well-optimized terrestrial cloud platforms. As content providers typically have a cost budget for distributing their data over CDN, contents should be judiciously distributed to satellites and clouds in a cost-effective manner, i.e., the constructed cooperative CDN is expected to satisfy various latency requirements, while involving acceptable operating cost.
• (ii) High mobility of LEO satellites. LEO satellites are moving at a very high velocity with the respect to the earth, resulting in unstable ground communication. Endhosts or cloud sites on the ground can only communicate to a LEO satellite if only the satellite moves into the line of sight (LoS) of the ground unit. Therefore, application-level sessions might be interrupted if the request is assigned to a satellite leaving the LoS. User requests should be properly assigned to avoid performance degradation caused by intermittent connectivity and session disruptions.
然而,虽然前景广阔,但我们认为,在云数据中心和巨型星座上构建这种协作 CDN 仍然面临着几个尚未解决的挑战。
(i) 稀缺且昂贵的空间资源。与经过良好优化的地面云平台相比,虽然网络和存储资源得到了发展,但在太空中仍然相对有限且昂贵。由于内容提供商通常有用于通过 CDN 分发其数据的成本预算,因此应以具有成本效益的方式将内容明智地分发到卫星和云,即,构建的协作 CDN 预计可以满足各种延迟要求,同时涉及可接受的运营成本。
(ii) LEO 卫星的高度移动性。LEO 卫星相对于地球以非常高的速度移动,导致地面通信不稳定。只有当卫星移动到地面单元的视线 (LoS) 中时,地面上的终端主机或云站点才能与 LEO 卫星通信。因此,如果请求被分配给离开 LoS 的卫星,则应用程序级别的会话可能会中断。应正确分配用户请求,以避免因间歇性连接和会话中断而导致的性能下降。