StarCDN: Moving Content Delivery Networks to Space¶
Low Earth Orbit (LEO) satellite networks, such as Starlink, provide global internet access and currently serve content to millions of users. Recent work has shown that existing network infrastructures, such as Content Delivery Networks (CDNs), are not well-suited to satellite network architectures. Traditional terrestrial CDNs degrade performance for satellite network users and do not alleviate the congestion in the ground-satellite links. We design StarCDN, a new CDN architecture that caches content in space to improve user experience and reduce ground-satellite bandwidth usage. The fundamental challenge in designing StarCDN lies in the orbital motion of satellites, which causes each satellite’s coverage area to change rapidly, serving vastly different regions (e.g., US and Europe) within minutes. To address this, we introduce new consistent hashing and relayed fetching schemes tailored to LEO satellite networks. Our design enables cached content to flow in the opposite direction of the orbital motion to counter satellite motion. We evaluate StarCDN against multiple baselines using real-world traces from Akamai. Our evaluation demonstrates that StarCDN can reduce the ground-to-satellite bandwidth utilization by 80% and improve user-perceived latency by 2.5X. Further, we make available an open-source trace generator, SpaceGEN, for realistic simulations of satellite-based CDNs.
以“星链”(Starlink)为代表的低地球轨道(LEO)卫星网络提供了全球互联网接入,目前正为数百万用户提供内容服务。近期研究表明,现有的网络基础设施,如内容分发网络(CDN),无法很好地适应卫星网络架构。传统的地面CDN会降低卫星网络用户的性能,并且无法缓解地-卫链路的拥塞。
为此,我们设计了一种名为 StarCDN 的新型CDN架构。该架构通过在太空中缓存内容来提升用户体验并减少地-卫带宽使用。设计StarCDN的根本挑战在于卫星的轨道运动,这种运动导致每颗卫星的覆盖区域迅速变化,使其在数分钟内服务于截然不同的地区(例如美国和欧洲)。
为应对此挑战,我们引入了专为LEO卫星网络量身定制的全新 一致性哈希(consistent hashing)和 中继获取(relayed fetching)方案。我们的设计使缓存内容能够沿轨道运动的相反方向流动,以此抵消卫星移动带来的影响。
我们使用来自Akamai的真实世界流量数据,将StarCDN与多个基准系统进行了评估对比。评估结果表明,StarCDN能够将地-卫带宽利用率降低80%,并将用户感知延迟改善2.5倍。此外,我们还提供了一个名为 SpaceGEN 的开源流量生成器,用于支持基于卫星的CDN的真实模拟。
Introduction¶
Satellite networks, offered through mega constellations operating in Low Earth Orbits (LEO), are rapidly gaining traction. Starlink, a leading LEO satellite network provider, already has over four million subscribers across 100+ countries [53]. It currently operates more than 7,000 satellites, with plans to expand to 40,000 satellites [10] and to provide direct-to-cell services [52]. Similarly, other companies such as OneWeb and Amazon Kuiper offer/plan to offer similar LEO satellite-based networking services. Increasingly LEO satellite networks (LSNs) are used for delivering content to users around the globe [53]. Our work is focused on enhancing the performance and reducing the cost of content delivery using LSNs.
通过在低地球轨道(LEO)运行的巨型星座提供的卫星网络正在迅速普及。“星链”(Starlink)作为领先的LEO卫星网络提供商,已在超过100个国家拥有逾四百万用户[53]。其目前运营超过7000颗卫星,并计划扩展至40000颗[10],同时提供卫星直连蜂窝服务[52]。同样,其他公司如OneWeb和亚马逊Kuiper也已提供或计划提供类似的基于LEO卫星的网络服务。LEO卫星网络(LSN)正越来越多地被用于向全球用户分发内容[53]。我们的工作重点是提升利用LSN进行内容分发的性能并降低其成本。
Content Delivery Networks (CDNs): CDNs were invented a quarter century ago to enhance the performance and reduce the cost of delivering content such as websites, media and downloads over the terrestrial internet [20]. CDNs deploy hundreds of thousands of “edge” servers around the world to cache and serve content from locations that are “proximal” to the user [42]. CDNs enable content to be served with lower latency, i.e., higher performance as perceived by the user, as content traverses a shorter network path from a proximal edge server to the user. CDNs also reduce the cost and network utilization since content can be downloaded once to a CDN edge server and delivered multiple times to users, saving the upstream WAN bandwidth (called midgress [59]) of transmitting content from an origin server to the edge. To achieve the performance and cost benefits, CDNs deploy clusters of edge servers across the globe, where each cluster caches and serves content to a proximal set of users. Further, CDNs use sophisticated techniques such as consistent hashing to manage the content within these clusters [36]. The benefits that CDNs provide have made them an essential component in modern internet infrastructure, and CDNs serve nearly 75% of global internet traffic [15].
Shortcomings of the current state-of-the-art: Current solutions that use traditional (terrestrial) CDN technology in conjunction with LSNs have several shortcomings that motivate our work. Recent work [8, 9] has shown that using traditional terrestrial CDNs in conjunction with Starlink degrades the Quality of Experience (QoE) for users. This degradation occurs because user traffic in Starlink flows through a bent-pipe architecture, wherein a user connects to a satellite, which in turn connects to the nearest ground station, as shown in Fig. 1. The ground station, then, connects to an edge server of the terrestrial CDN, increasing the latency by over 100 milliseconds [27]. A small fraction of the network traffic may also flow through inter-satellite links (ISLs) – ISLs have abundant bandwidth (100Gbps) compared to ground-satellite links (20Gbps).
In such cases, a user in area-1 (Fig. 1) may connect via ISL to a CDN edge server in area-2 that is much farther than their home location. Besides the increased latency, the CDN server in area-2 may also cache different content and apply different geo-access constraints. For example, [40] observed that users in Africa may connect to ground stations in Europe that connect to CDN servers that likely have Europe-specific content in their cache.
内容分发网络(CDN): CDN诞生于二十五年前,旨在提升在地面互联网上传输网站、媒体和下载内容等信息的性能并降低成本[20]。CDN在全球部署了数十万台“边缘”服务器,用于从“靠近”用户的位置缓存并提供内容[42]。由于内容从邻近的边缘服务器到用户的网络路径更短,CDN能够以更低的延迟提供服务,从而提升用户感知的性能。同时,CDN也降低了成本和网络利用率,因为内容只需下载一次到CDN边缘服务器,便可多次分发给用户,从而节省了将内容从源服务器传输到边缘服务器的上行广域网带宽(称为中间程midgress [59])。为实现性能和成本效益,CDN在全球范围部署边缘服务器集群,每个集群为其邻近的用户群缓存并提供内容。此外,CDN还使用一致性哈希等复杂技术来管理这些集群内的内容[36]。CDN所带来的益处使其成为现代互联网基础设施中不可或缺的组成部分,承载了全球近75%的互联网流量[15]。
当前主流方案的缺陷: 当前将传统(地面)CDN技术与LSN结合的解决方案存在若干缺陷,这些缺陷是本项工作的动机。近期研究[8, 9]表明,将传统地面CDN与“星链”结合使用会降低用户的体验质量(QoE)。这种性能下降是由于“星链”中的用户流量采用了“弯管”(bent-pipe)架构,即用户连接到卫星,卫星再连接到最近的地面站,如图1所示。然后,地面站再连接到地面CDN的边缘服务器,这会增加超过100毫秒的延迟[27]。一小部分网络流量也可能通过星间链路(ISL)传输——相比于地-卫链路(20Gbps),星间链路拥有更充裕的带宽(100Gbps)。
在这种情况下,区域1(图1)的用户可能通过ISL连接到区域2的CDN边缘服务器,该服务器远比其本地位置要远。除了延迟增加外,区域2的CDN服务器缓存的内容可能不同,并且可能应用不同的地理访问限制。例如,[40]观察到,非洲的用户可能会连接到欧洲的地面站,而这些地面站所连接的CDN服务器缓存的更可能是欧洲特定的内容。
In addition to performance degradation due to the increased latency, traditional CDNs do not improve the utilization of the satellite links of the LSNs. In the current state-of-the-art, if multiple users watch the same video, it must be uploaded multiple times to the satellite, wasting precious uplink bandwidth to the satellite.
除了延迟增加导致的性能下降外,传统CDN也未能改善LSN的卫星链路利用率。在当前的主流方案中,如果多个用户观看同一个视频,该视频必须被多次上传到卫星,这浪费了宝贵的卫星上行带宽。
Our focus: The primary question that motivates our work is whether CDN technology can be used to enhance performance, improve network utilization, and reduce the operating cost of delivering content to users of LSNs. Our main approach to answering this question is exploring whether a system of edge servers can be deployed in LEO satellites to cache and serve content to (terrestrial) users. However, the significant challenge to pursuing this approach is the fact that the edge servers are in motion as the satellite orbits the Earth once every 90 minutes. Unlike a traditional CDN where the edge servers are stationary, in our setting, the users that are proximal to an edge server vary dynamically in the order of minutes, as do their content access patterns. Rethinking the CDN’s edge cluster architecture in the new context of LSNs, in particular its content placement, request routing, and caching, is the main contribution of our work.
我们的焦点: 驱动我们工作的核心问题是:CDN技术是否能被用于提升性能、改善网络利用率,并降低向LSN用户分发内容的运营成本?
我们回答此问题的主要方法是 探索在LEO卫星上部署边缘服务器系统以缓存和提供内容的可行性。
然而,实践这一方法面临的重大挑战在于,边缘服务器随着卫星每90分钟绕地球一周而处于运动状态。与边缘服务器固定的传统CDN不同,在我们的设定中,靠近某个边缘服务器的用户群在分钟级别内动态变化,其内容访问模式也随之改变。因此,在LSN这一新背景下,重新思考CDN的边缘集群架构,特别是其内容放置、请求路由和缓存策略,是我们工作的主要贡献。
Our approach: We propose a space-based content delivery network called StarCDN that is specifically architected to work with LSNs. StarCDN leverages emerging computational capabilities of satellites to deploy edge servers that can cache content in space, thus reducing the latency of access for users and improving their quality of experience. Further, StarCDN reduces the uplink bandwidth required for satellite networks, allowing the scarce spectrum to be repurposed for downlink demand. In designing StarCDN, we solve three key challenges:
(i) Multi-satellite Redundancy: To provide optimal service at long range, LSNs use dense deployments comprising thousands of satellites. Therefore, at any time instant, a Starlink user can connect to 10+ satellites. This set of satellites is dynamic and can change within a few minutes due to the satellite’s orbital motion. Since a user can connect to any of the visible satellites, the content requested by this user must be cached at all the visible satellites for the user to reliably receive it from the cache. This leads to high miss rates and wastage of precious storage space on satellites.
To counter these effects, we propose a consistent hashing scheme [30, 36] for edge servers deployed in the satellites. In StarCDN, we hash each object to one of 𝐾 (e.g., 𝐾 = 4) buckets and map each bucket to a different satellite. When a satellite receives a request for data in its own bucket, it can simply respond with the data (i.e., cache hit) or request data from the ground (i.e., cache miss). However, if a satellite receives a request for data from a different bucket, it forwards the request to a neighboring satellite with this bucket using its ISL link. We map buckets to satellites in a grid such that each bucket is at most 2|\(\frac{\sqrt{K}}{2}\)| hops away in the constellation. Our consistent hashing approach optimizes the utilization of storage capacity in space, while minimizing the additional latency experienced for requests.
(ii) Orbital motion: In terrestrial CDNs, objects are cached in servers that are located in physical proximity to the users. Caching decisions rely on local popularity characteristics of objects served by the CDN. For example, least recently used (LRU) or least frequently used (LFU) objects are removed from the cache to make way for the addition of new objects. However, LEO satellites orbit around the Earth at speeds of around 8 km per second and serve a given location for less than ten minutes. A satellite serving users in the United States may serve European users in a matter of minutes. Therefore, the access pattern, popularity statistics, and cache content rapidly grow stale, leading to low hit rates. Our analysis shows that a simple LRU cache deployed on satellites will achieve a hit rate below 60% due to such orbital motion.
To counter the effects of orbital motion, StarCDN deploys a relayed fetch technique, where a satellite can relay a request to a neighboring satellite (with the same bucket ID) using its ISL link in case of a cache miss. This allows StarCDN to benefit from a previous satellite’s cached object that just served a region. Effectively, this allows cache content to flow backwards, i.e., in the opposite direction of the orbital motion. Note that this backwards flow does not propagate objects that are no longer popular because the relay is initiated only in cases of a cache miss of an accessed object. We limit the relay to the nearest neighbor to cap the additional latency incurred due to such relays.
(iii) A novel publicly available trace generator that captures the content access patterns of LSN users distributed around the world: It is challenging to evaluate a space-based CDN design due to the requirement for globally distributed traffic traces that capture content accesses of users around the world. Traditional CDN designs are evaluated using traces from one (or a few) locations. However, a satellite orbits the globe, and to effectively evaluate StarCDN, we require traces from multiple locations on Earth for a large period of time. To achieve this, we first collected limited real-world traffic traces from nine locations across the world from Akamai’s CDN over one day. Then, we designed a new synthetic trace generator, SpaceGEN, that uses footprint descriptors [58] to capture both temporal variations of content access patterns within a location, such as object access frequencies, and geographical variations of content accesses across locations, such as how content accessed across locations overlap. Further, SpaceGEN can produce synthetic traces for the major traffic classes hosted on a CDN, such as web, video and software downloads. Our trace generator enables realistic long-term evaluation of StarCDN and other baselines.
我们的方法: 我们提出了一种名为 StarCDN 的天基内容分发网络,该网络专为与LSN协同工作而设计。StarCDN利用新兴的卫星计算能力来部署可在太空中缓存内容的边缘服务器,从而降低用户的访问延迟并改善其体验质量。此外,StarCDN减少了卫星网络所需的上行带宽,使得稀缺的频谱资源可以被重新用于满足下行需求。在设计StarCDN时,我们解决了三个关键挑战:
(i) 多卫星冗余: 为在远距离提供最优服务,LSN采用由数千颗卫星组成的密集部署。因此,在任何时刻,一个“星链”用户都可以连接到10多颗卫星。由于卫星的轨道运动,这组可见卫星是动态的,并在几分钟内发生变化。 既然用户可以连接到任何可见的卫星,那么用户请求的内容必须在所有可见卫星上都进行缓存 ,用户才能可靠地从缓存中获取。 这会导致高未命中率和宝贵星上存储空间的浪费。
为应对这些影响,我们为部署在卫星上的边缘服务器提出了一种 一致性哈希 方案[30, 36]:
在StarCDN中,我们将每个对象哈希到 \(K\) 个(例如,\(K=4\))桶中的一个,并将每个桶映射到不同的卫星。
- 当一颗卫星收到对其自身桶内数据的请求时,它可以直接用数据响应(即缓存命中),或从地面请求数据(即缓存未命中)
- 然而,如果一颗卫星收到对不同桶内数据的请求,它会通过其ISL链路将请求转发给拥有该桶的邻近卫星
我们将桶以网格状映射到卫星,使得星座中任意一个桶的距离最多为 \(2|\frac{\sqrt{K}}{2}|\) 跳。我们的一致性哈希方法优化了星上存储容量的利用率,同时最小化了请求所带来的额外延迟。
(ii) 轨道运动: 在地面CDN中,对象被缓存在物理上邻近用户的服务器中。缓存决策依赖于CDN所服务对象的本地流行度特征。例如,最近最少使用(LRU)或最不经常使用(LFU)的对象会从缓存中移除,以便为新对象腾出空间。然而,LEO卫星以大约每秒8公里的速度绕地球运行,为特定地点服务的时间不足十分钟。一颗正在为美国用户服务的卫星可能在几分钟后就开始为欧洲用户服务。因此,访问模式、流行度统计数据和缓存内容会迅速变得陈旧,导致低命中率。我们的分析表明,由于这种轨道运动,简单部署在卫星上的LRU缓存命中率将低于60%
为抵消轨道运动的影响,StarCDN部署了一种中继获取(relayed fetch)技术:
在该技术中,当发生缓存未命中时,一颗卫星可以通过其ISL链路将请求中继给拥有相同桶ID的邻近卫星。这使得StarCDN能够受益于刚刚服务过该区域的前一颗卫星所缓存的对象。
这有效地让缓存内容得以“向后”流动,即与轨道运动相反的方向!!!
值得注意的是,这种 向后流动不会传播不再流行的对象,因为中继仅在已访问对象发生缓存未命中时才会启动。 我们将中继限制在最近的邻居,以控制此类中继所产生的额外延迟。
(iii) 一种新颖的、公开可用的流量轨迹生成器,用于捕捉全球LSN用户的内容访问模式: 评估一个天基CDN设计具有挑战性,因为它需要能够捕捉全球用户内容访问的分布式流量轨迹。传统CDN设计通常使用来自一个(或少数几个)地点的流量轨迹进行评估。然而,一颗卫星环绕全球运行,为了有效评估StarCDN,我们需要来自地球上多个地点、长时间段的流量轨迹。
为实现此目标,我们首先在一天内从Akamai的CDN收集了来自全球九个地点的有限真实世界流量轨迹。然后,我们设计了一款新的合成流量轨迹生成器SpaceGEN,它使用足迹描述符[58]来捕捉一个地点内内容访问模式的时间变化(如对象访问频率)和跨地点内容访问的地理变化(如不同地点访问内容的重叠程度)。此外,SpaceGEN可以为CDN上承载的主要流量类别(如网页、视频和软件下载)生成合成流量轨迹。我们的流量生成器为StarCDN及其他基准系统提供了逼真的长期评估能力。
Summary of contributions:
• StarCDN is the first space-based CDN designed to improve the content access experience of LSN users while optimizing the utilization of the ground-satellite network (§3).
• We design and evaluate a novel LSN-specific consistent hashing scheme to reduce redundancy in satellite caches and improve cache performance (§3.2). Further, we design and evaluate a relayed fetch scheme for content to counter the effects of the satellite’s orbital motion (§3.3).
• StarCDN is evaluated using realistic content access traces from a geo-distributed set of LSN users. These synthetic traces were derived using SpaceGEN, our novel trace generator that produces synthetic traces that are similar to actual production traces for different traffic classes, such as videos, web, and download content (§4). To the best of our knowledge, SpaceGEN is the first trace generator for cache simulations that incorporates both temporal and geographic variations in content access patterns of users. To support more research in the area, we have released both our StarCDN simulation framework 1 and SpaceGEN trace generator 2 on GitHub.
• We simulated CDN edge servers on satellites using actual servers while simulating the satellite orbital motion and field of views using the Microsoft CosmicBeats [38, 48] simulator (§5). We simulate 1170 satellites from the Starlink constellation for our experiments. Our experiments demonstrate that StarCDN can improve the cache hit rates in space from 60% to 75%. It can also reduce the satellite network uplink utilization by up to 80%, and improve user-perceived latency by 2.5X.
贡献总结:
- StarCDN 是首个为改善LSN用户内容访问体验、同时优化地-卫网络利用率而设计的天基CDN(§3)。
- 我们设计并评估了一种新颖的、针对LSN的 一致性哈希 方案,以减少卫星缓存的冗余并提高缓存性能(§3.2)。此外,我们设计并评估了一种 中继获取 方案,以抵消卫星轨道运动的影响(§3.3)。
- StarCDN的评估使用了来自地理分布式LSN用户的逼真内容访问轨迹。这些合成轨迹由我们新颖的流量生成器SpaceGEN生成,它能为不同流量类别(如视频、网页和下载内容)产生与真实生产环境轨迹相似的合成流量(§4)。据我们所知,SpaceGEN是首个同时融合了用户内容访问模式中时间和地理变化的缓存模拟流量生成器。为支持该领域的更多研究,我们已在GitHub上发布了我们的StarCDN模拟框架¹和SpaceGEN流量生成器²。
- 我们使用实际服务器模拟卫星上的CDN边缘服务器,同时利用微软的CosmicBeats [38, 48]模拟器来模拟卫星的轨道运动和视场(§5)。我们在实验中模拟了来自“星链”星座的1170颗卫星。我们的实验表明,StarCDN能将空间缓存命中率从60%提升至75%。它还能将卫星网络上行链路利用率降低高达80%,并将用户感知延迟改善2.5倍。
tl; dr
- 一致性哈希: 本质是打tag
- 一个sat收到"tag一致"的request: 立马响应! 直接返回星上现有数据 or 从GSfetch后return
- 一个sat收到"tag不同"的request: 转发给附近的 "tag一致" 的sat
- 中继获取: "向后传播"
Background¶
We provide a brief background for LSNs and CDNs.
2.1 LEO Satellite Networks¶
Satellite-based internet, using LEO satellites, has gained widespread adoption over the last decade, with the emergence of Starlink, Oneweb, and Kuiper[2, 43, 54]. Starlink is the most mature LEO Network service provider, with more than 7,000 satellites in orbit.
Orbital motion: LEO satellites orbit around the Earth at an altitude of around 550 km (compared to traditional geostationary satellites at nearly 36000 km). Therefore, LEO satellites offer significantly lower propagation delay to support modern internet applications. LEO satellites also offer a better link bandwidth budget (and hence more throughput) due to their proximity to Earth. However, LEO satellites must have high speeds in order to maintain their orbits. For context, LEO satellites orbit the Earth approximately every 90 minutes. Due to this fast orbital motion, a user dish can connect to a satellite for at most a few minutes, resulting in frequent usersatellite link "handovers" when satellites move out of view.
Network routing: Starlink satellites are equipped with radio wave antennas for ground-satellite links and optical transceivers for intersatellite links (ISLs) [55]. The common mode of operation is a bent pipe model, where the user dish connects to a satellite, which in turn connects to a Starlink ground station. The ground station forwards packets through the terrestrial network and connects to the rest of the internet through an Internet Exchange Point (IXP). This pipeline is shown in Fig. 1.
ISLs have recently been introduced in Starlink and are commonly used in locations without a ground station nearby (e.g., in many countries in Africa). A satellite uses ISLs to route traffic to other satellites, which downlink it to a ground station on Earth. Starlink satellites typically support four ISLs: two intra-orbit links (to previous and next satellites in the same orbit) and two inter-orbit links (to adjacent satellites in parallel orbits)[72]. We list propagation delays and bandwidths of ground-satellite links (GSLs) and ISLs in Table. 1. We visualize Starlink’s orbits and ISLs in Fig. 5b.
在过去十年中,随着“星链”(Starlink)、Oneweb和Kuiper [2, 43, 54]的兴起,基于LEO卫星的互联网已得到广泛应用。“星链”是目前最成熟的LEO网络服务提供商,在轨卫星数量已超过7000颗。
轨道运动: LEO卫星在约550公里的高度环绕地球运行(相比之下,传统的地球静止卫星高度接近36000公里)。因此,LEO卫星能提供显著更低的传播延迟,以支持现代互联网应用。由于更接近地球,LEO卫星也具有更好的链路带宽预算(从而带来更高的吞吐量)。然而,为了维持轨道,LEO卫星必须保持高速运行。具体来说,LEO卫星大约每90分钟绕地球一圈。由于这种快速的轨道运动,用户终端与单颗卫星的连接最多只能维持几分钟,当卫星移出视野时,会导致频繁的用户-卫星链路“切换”(handovers)。
网络路由: “星链”卫星配备了用于地-卫链路的无线电波天线和用于星间链路(ISL)的光收发器[55]。其常见的运行模式是 “弯管”(bent pipe)模型,即用户终端连接到一颗卫星,该卫星再连接到一个“星链”地面站。地面站通过地面网络转发数据包,并通过互联网交换点(IXP)接入其余的互联网。这一流程如图1所示。
星间链路(ISL)是“星链”近期引入的技术,通常用于附近没有地面站的地区(例如,非洲的许多国家)。卫星利用ISL将流量路由到其他卫星,再由后者将数据下行传输到地球上的地面站。“星链”卫星通常支持四条ISL:两条轨道内链路(连接同一轨道中的前一颗和后一颗卫星)和两条轨道间链路(连接相邻平行轨道上的卫星)[72]。我们在表1中列出了地-卫链路(GSL)和星间链路(ISL)的传播延迟与带宽。我们在图5b中对“星链”的轨道和ISL进行了可视化展示。
2.2 Content Delivery Networks¶
A Content Delivery Network (CDN) is a large distributed system potentially consisting of hundreds of thousands of edge servers deployed in thousands of locations across the world [20, 42]. The edge servers are deployed in clusters where each cluster is deployed within a data center or colocation facility. A CDN edge server can cache and deliver content to users on behalf of potentially thousands of content providers. While the original content provider stores all objects in their “origin” servers, CDNs do not cache every object at the edge server due to limited cache space. When a user requests content, say a web page or a video, that request is routed to a proximal edge server of the CDN. If that server has the requested content, a cache hit is said to have occurred, and the user is served the content. If the edge server does not have the requested content, a cache miss is said to have occurred, and the content is fetched over the WAN from the origin server and served to the user. When the cache of an edge server is full, CDN providers utilize eviction policies to determine which content should be removed from the cache[14, 71]. Various eviction policies have different strengths and weaknesses. A commonly used policy is the Least Recently Used (LRU) policy, and different LRU variants are often deployed in commercial CDNs [36] for their simplicity and effectiveness. In LRU, an object with the oldest last access is evicted from the cache.
Traditional CDN infrastructures are geographically static, relying on deploying edge servers close to the users [25] and routing requests from users to a proximal edge server [13, 36] that can serve the content. The cache hit rate is a key metric to maximize because each cache miss increases both the latency experienced by the user and the upstream WAN bandwidth for fetching the “missed” content from the origin server. The hit rate can be measured in two ways – the request hit rate is the fraction of requests that were cache hits, while the byte hit rate is the fraction of bytes served for requests that were cache hits.
CDN architectures vary from provider to provider. Netflix [26, 27], YouTube [1, 67], and Amazon Prime are vertically integrated and provide both the content and CDN services for their users and are largely focused on video content delivery. On the other hand, Akamai [42, 47] and Cloudflare [21, 34] provide general-purpose third-party CDN services for a large number of content providers, delivering a range of traffic classes such as web, videos, and software downloads. While some CDNs serve content from larger edge server clusters in fewer locations, others deploy smaller clusters in a large number of locations. However, all CDNs share the basic principles of routing requests from users to proximal edge servers that cache and serve popular content. While we use traces from Akamai’s CDN for our empirical evaluation, the architectural ideas for caching and content management proposed and studied in this work are applicable across a wide range of CDN architectures. Further, our open-sourced trace generation tool (SpaceGEN) and simulation framework allow evaluation of other satellite-based CDN architectures using production logs from other CDNs.
内容分发网络(CDN)是一个大型分布式系统,可能由部署在全球数千个地点的数十万台边缘服务器组成[20, 42]。这些边缘服务器以集群形式部署在数据中心或托管设施内。一台CDN边缘服务器可以代表数千个内容提供商缓存并向用户分发内容。虽然原始内容提供商在其“源”服务器中存储所有对象,但由于缓存空间有限,CDN并不会在边缘服务器上缓存每一个对象。
当用户请求内容(例如一个网页或视频)时,该请求会被路由到CDN的一个邻近边缘服务器:
- 如果该服务器存有请求的内容,则称为缓存命中(cache hit),并直接向用户提供内容
- 如果边缘服务器没有该内容,则称为缓存未命中(cache miss),此时内容将通过广域网(WAN)从源服务器获取,然后再提供给用户
当边缘服务器的缓存已满时,CDN提供商会利用淘汰策略(eviction policies)来决定应从缓存中移除哪些内容[14, 71]。各种淘汰策略各有优劣。一种常用的策略是最近最少使用(LRU)策略,因其简单有效,不同变种的LRU策略常被部署在商业CDN中[36]。在LRU策略下,最后一次访问时间最早的对象将被从缓存中淘汰。
传统的CDN基础设施在地理上是静态的,其核心依赖于将边缘服务器部署在靠近用户的地方[25],并将用户的请求路由到能够提供内容的邻近边缘服务器[13, 36]
缓存命中率是需要最大化的关键指标,因为每一次缓存未命中都会增加用户感知的延迟,并消耗用于从源服务器获取“未命中”内容的上行广域网带宽
命中率可以从两个维度衡量:
- 请求命中率 指缓存命中的请求占总请求的比例
- 字节命中率 指缓存命中的请求所服务的字节数占总服务字节数的比例
不同提供商的CDN架构各不相同。Netflix [26, 27]、YouTube [1, 67]和Amazon Prime是垂直整合的,它们既提供内容也为自己的用户提供CDN服务,主要专注于视频内容分发。另一方面,Akamai [42, 47]和Cloudflare [21, 34]则为大量内容提供商提供通用的第三方CDN服务,分发包括网页、视频和软件下载在内的多种流量类型。一些CDN在较少的地点部署大型边缘服务器集群,而另一些则在大量地点部署小型集群。然而,所有CDN都遵循一个基本原则:将用户请求路由到缓存并提供热门内容的邻近边缘服务器。虽然我们的实证评估使用了来自Akamai CDN的流量轨迹,但本工作中提出和研究的缓存与内容管理架构思想适用于广泛的CDN架构。此外,我们开源的流量生成工具(SpaceGEN)和模拟框架也支持使用其他CDN的生产日志来评估其他基于卫星的CDN架构。
2.3 Feasibility of In-space Compute and Storage¶
CDN edge servers in space will require storage, compute, and power on satellites. We discuss the feasibility of providing these resources on board a satellite. Recent developments in the last decade have shown significant potential for placing computational and storage capabilities on satellites in space [7, 19, 41, 63, 70]. In-orbit computing [7] explores the feasibility of augmenting satellites with edge-compute capabilities, a constraint relevant for CDN integration. The same work describes the power, weight, and voluminous feasibility of placing a high-end server with up to 2 TB storage on Starlink satellites. Newer servers can hold even more storage and compute with the same requirements, indicating that lesser volume and weight will be required with advancing technologies in the field. Some missions have already launched satellites with computing capabilities in space for various applications [22, 32, 60, 69]. For example, both Planet [32] and European Space Agency [22] have demonstrated the ability to run machine learning models onboard small satellites using edge-computing devices (such as the NVIDIA Jetson).
在太空中部署CDN边缘服务器需要卫星具备存储、计算和电力资源。我们在此讨论在卫星上提供这些资源的可行性。过去十年的发展已显示出在太空卫星上部署计算和存储能力的巨大潜力[7, 19, 41, 63, 70]。“在轨计算”[7]一文探讨了为卫星增加边缘计算能力的可行性,这与CDN集成直接相关。该研究描述了在“星链”卫星上放置一台配备高达2TB存储的高端服务器在功率、重量和体积上的可行性。更新的服务器可以用同样的需求容纳更多的存储和算力,这表明随着该领域技术的进步,所需的体积和重量将会减少。一些任务已经发射了具备星上计算能力的卫星,用于各种应用[22, 32, 60, 69]。例如,Planet [32]和欧洲航天局[22]都已展示了利用边缘计算设备(如NVIDIA Jetson)在小型卫星上运行机器学习模型的能力。
StarCDN System Architecture¶
We propose a novel space-based content delivery network, StarCDN, that is specifically designed to work with LSNs. StarCDN deploys CDN edge servers in the satellites to cache content while utilizing the ISL links to fetch content from neighbors as needed. In designing StarCDN, we aim for the following objectives:
• Reduce latency: Satellite network users should experience similar latencies as terrestrial network users for accessing content.
• Reduce uplink bandwidth utilization: Currently, all content served to users must be sent from ground stations to satellites via their uplinks. We aim to reduce uplink utilization by reducing the need to fetch content from the ground. This goal is motivated by the surge in demand for satellite-based services. Over the past few years, the Starlink user base has grown from approximately 1 million to over 4 million globally [50, 56], leading to increased contention for uplink and downlink bandwidth, particularly in densely populated and high-traffic regions. In response, Starlink has started to pause new subscriptions in areas of high demand [16]. Reducing uplink utilization will free up bandwidth that can be repurposed for additional users.
• Compatibility with current network architecture: We aim to maintain compatibility with existing satellite network architecture, i.e., we carefully model GSLs, ISLs, constellation size, satellite orbits, and ground stations using publicly available information about the Starlink network.
我们提出了一种名为 StarCDN 的新颖天基内容分发网络,该网络专为与LEO卫星网络(LSN)协同工作而设计。StarCDN在卫星上部署CDN边缘服务器以缓存内容,同时利用星间链路(ISL)根据需要从邻近卫星获取内容。在设计StarCDN时,我们旨在实现以下目标:
-
降低延迟: 卫星网络用户在访问内容时,应体验到与地面网络用户相似的延迟
-
降低上行带宽利用率: 目前,所有提供给用户的内容都必须通过上行链路从地面站发送到卫星。 我们的目标是通过减少从地面获取内容的需求来 降低上行链路的利用率
- 这一目标的动机在于卫星服务需求的激增
- 在过去几年中,全球“星链”用户基数已从约100万增长到超过400万[50, 56],导致上行和下行带宽的争用加剧,尤其是在人口密集和高流量区域。为此,“星链”已开始在需求量大的地区暂停新的用户订阅[16]
- 降低上行链路利用率将释放带宽,这些带宽可被重新分配给更多用户使用
-
与当前网络架构的兼容性: 我们旨在保持与现有卫星网络架构的兼容性,即我们利用关于“星链”网络的公开信息,仔细地对地-卫链路(GSL)、星间链路(ISL)、星座规模、卫星轨道和地面站进行建模
3.1 A Naive Design and Resulting Challenges¶
To motivate the design choices in StarCDN, we begin by considering a naive satellite CDN design. Consider a constellation of Starlink satellites where each satellite is equipped an edge server that can cache content. Each cache follows the popular LRU eviction policy. This setup mimics a terrestrial CDN design where each server independently implements an LRU-like eviction policy. We analyze the unique challenges that satellite networks pose for this setup.
为阐明 StarCDN 的设计选择,我们首先考虑一种朴素的卫星 CDN 设计。设想一个由星链(Starlink)卫星组成的星座,其中每颗卫星都配备一个可以缓存内容的边缘服务器。每个缓存都遵循流行的 LRU(最近最少使用)淘汰策略。这种设置模仿了地面 CDN 的设计,即每个服务器独立实现类似 LRU 的淘汰策略。我们分析了卫星网络为此类设置带来的独特挑战。
3.1.1 Challenge 1: Dynamic Access Patterns due to Orbital Motion. A LEO satellite orbits around the Earth in 90 minutes. Since the Earth rotates around its axis every 24 hours (not synchronized with the LEO satellite), the LEO satellite covers different parts of Earth during each orbit (see Fig. 3 for an illustration). This implies that each cache serves users in different geographies over time. We note that this is a drastically different scenario compared to terrestrial CDNs, where servers are stationary and typically serve users in an area proximal to the server.
To understand the implications of this motion, we analyze the geographic diversity in content access patterns. Specifically, we collect and analyze production traffic traces of users accessing video content from nine edge server clusters across the globe of the Akamai CDN: Mexico City, Dallas, Atlanta, Washington D.C., New York City, London, Frankfurt, Vienna, and Istanbul. We sample traces (subsampled at 1% ) from diverse cities around the world that have a large population and user demand. The traces contain anonymized information about each access made by users, including what content was accessed at what time and from which edge server. We sample more cities from the US because it has the highest Starlink users today[18]. The whole trace consists of 423M video requests (512TB) for 24M unique objects (24TB).
To quantify the impact of satellite motion, we study the overlap between objects and the content access traffic from each pair of countries with different official languages in Europe in Table. 2. We note that different languages create noticeable traffic diversity even within a single continent. These diverse regions can be traversed and served by the same satellite within minutes.
Further, we plot the overlap in terms of unique objects and unique volume of access traffic with respect to geographical distance to New York in Fig. 2. Our analysis reveals several interesting insights. First, for regions closer to New York (< 3000km), the overlap is around 55% in terms of the objects served. This indicates that even close cities like New York and Washington DC have nonoverlapping objects being requested by users. However, in terms of traffic volume, around 90% of the traffic volume goes to objects that are available in both adjacent locations. Second, across countries that are more than 3000km apart, the overlap is fairly low both in terms of unique objects served and the traffic volume. Even in an English-speaking city like London, only about a quarter of the traffic is also present in New York. This indicates high geographic diversity. Traditionally, a CDN provider would provision dedicated cache clusters for geographically distinct regions. However, the same satellite can go from US to Europe within tens of minutes, indicating the unique challenge of designing caching schemes for satellites. By the time a satellite sees enough traffic to build a cache, its orbital motion pulls it away to a different region.
Takeway: Rapid orbital motion of LEO satellites places them over different geographic regions within minutes. These regions require different content cached on the satellite.
3.1.1 挑战 1:轨道运动导致的动态访问模式
一颗 LEO(低地球轨道)卫星在 90 分钟内绕地球运行一周。由于地球每 24 小时绕轴自转一次(与 LEO 卫星并不同步),LEO 卫星在每次轨道运行期间会覆盖地球的不同区域(如图 3 所示)。这意味着每个缓存随时间推移会为不同地理区域的用户提供服务。我们注意到,这与地面 CDN 的情况截然不同,后者的服务器是固定的,通常为其邻近区域的用户提供服务。
为了理解这种运动带来的影响,我们收集并分析了 Akamai CDN 在全球九个边缘服务器集群(墨西哥城、达拉斯、亚特兰大、华盛顿特区、纽约、伦敦、法兰克福、维也纳和伊斯坦布尔)的用户访问视频内容的真实业务流量追踪数据。我们从全球人口众多、用户需求量大的不同城市中抽样了追踪数据(以 1% 的比例进行二次抽样)。这些追踪数据包含用户每次访问的匿名化信息,包括在何时、从哪个边缘服务器访问了何种内容。我们从美国抽样了更多的城市,因为目前其星链用户数量最多[18]。整个追踪数据集包含对 2400 万个独立对象(总计 24TB)的 4.23 亿次视频请求(总计 512TB)。
为量化卫星运动的影响,我们在表 2 中研究了欧洲官方语言不同的国家对之间,在访问对象和内容访问流量方面的重叠度。我们注意到,即使在同一大洲内,不同的语言也会造成明显的流量差异。而同一颗卫星可以在数分钟内穿越并服务于这些多样化的区域。
此外,我们在图 2 中绘制了就独立对象和访问流量而言,重叠度与地理距离(相对于纽约)的关系图。我们的分析揭示了几个有趣的观点。首先,对于距离纽约较近的地区(< 3000 公里),对象的重叠度约为 55%。这表明即使是像纽约和华盛顿特区这样的邻近城市,用户请求的对象也存在不重叠的部分。然而,从流量角度看,大约 90% 的流量流向了两个相邻地点都共有的对象。其次,对于相距超过 3000 公里的国家,无论是在独立对象还是在流量方面,重叠度都相当低。即使是在伦敦这样一个同样使用英语的城市,也只有大约四分之一的流量与纽约的重合。这表明了高度的地理多样性。传统上,CDN 提供商会为地理上不同的区域部署专用的缓存集群。然而,同一颗卫星可以在几十分钟内从美国飞到欧洲,这凸显了为卫星设计缓存方案所面临的独特挑战。当一颗卫星观测到足够的流量来构建缓存时,其轨道运动可能已经将其带到了一个需求不同的新区域。
核心要点: LEO 卫星的快速轨道运动使其在数分钟内就会移动到不同的地理区域上空。这些区域要求卫星上缓存不同的内容。
3.1.2 Dynamic client-server relationships. A corollary of the orbital motion discussed above is that client-server relationships constantly evolve over time in satellite networks. In terrestrial CDNs, a mapping system is used to route a client’s (i.e., user’s) request to a proximal edge cluster, either using the DNS system [13, 36] or IP anycast [11, 73], and an intra-cluster “local” load balancer assigns the client to an edge server within the chosen cluster. The server assignment of a client is generally stable since the server is stationary and the clients are generally located within the same geographical area. While a client may get assigned to multiple servers over time due to changes in server and network conditions, these servers typically cache the same regional content relevant to the clients.
However, LEO satellites orbit around the Earth at low altitudes, and each of them can only cover a small region of Earth as compared to GEO satellites. Satellite network providers usually build a large satellite constellation to increase the coverage area. For example, at any time instant, a Starlink client often has 10+ satellites in view, and the Starlink scheduler is responsible for scheduling client-to-satellite links [51]. We refer to the satellite scheduled for a client as the first contact satellite, which changes rapidly. Recent literature shows that the Starlink scheduler reconfigures the user-satellite mapping every 15 seconds [51]. In general, in any LEO network, the client-satellite mapping cannot last beyond a few minutes. This means in satellite-based CDNs, it is impossible to maintain a stable client-server mapping beyond few minutes. There are two implications of this dynamic mapping. Multiple satellites that serve the same area may need to cache and serve the same objects redundantly, resulting in reduced cache storage efficiency. Further, satellites need to evolve their cached content as the geographical area changes.
Consider an example in Fig. 4. user-1 and user-2 are scheduled to choose satellites D and B respectively, as the first contact satellites, decided by the satellite link scheduler (all satellites in this figure are in view and connectable for both users). Now, user-1 initiates a request, which is resolved to the proximal edge server on satellite D. Satellite D checks its local cache and serves the content if it’s available. Otherwise, it downlinks the request to terrestrial networks to retrieve and cache the requested content in its local cache. What if user-2 now requests the same content as user-1? The same process will happen, and the cache server on satellite B will fetch and cache the same content, without recognizing that this request can be fulfilled by retrieving the content from the cache at the neighboring satellite D through ISL links. Thus, the redundant storage of the same content in both satellites decreases the overall cache storage efficiency of the system, and user-2 may perceive high latency if there is a cache miss at satellite B, resulting in ground-satellite communications.
Takeaway: Dynamic client-server mappings may lead to redundant storage of the same content in multiple caches, resulting in a reduction of the overall cache storage efficiency. Further, it could also result in greater latency during cache misses due to additional ground-satellite communication.
作为上述轨道运动的一个必然结果,卫星网络中的客户端-服务器关系会随着时间不断演变。在地面 CDN 中,一个映射系统(使用 DNS 系统 [13, 36] 或 IP anycast [11, 73])被用来将客户端(即用户)的请求路由到一个邻近的边缘集群,然后由集群内的“本地”负载均衡器将该客户端分配给所选集群中的一个边缘服务器。客户端的服务器分配通常是稳定的,因为服务器是固定的,且客户端通常位于同一地理区域内。尽管由于服务器和网络状况的变化,一个客户端可能会被分配到多个不同的服务器,但这些服务器通常缓存着与该客户端相关的相同区域性内容。
然而,LEO 卫星在低空轨道上环绕地球运行,与 GEO(地球静止轨道)卫星相比,每颗卫星只能覆盖地球的一小片区域。因此,卫星网络提供商通常会构建一个大型卫星星座来增加覆盖面积。例如,在任何时刻,一个星链(Starlink)客户端的视野内通常有超过 10 颗卫星,而星链调度器负责调度客户端到卫星的链路 [51]。我们将调度器为客户端分配的卫星称为首选接入卫星(first contact satellite),而这种分配关系变化迅速。最近的文献表明,星链调度器每 15 秒就会重新配置一次用户-卫星的映射关系 [51]。通常来说,在任何 LEO 网络中,客户端-卫星的映射关系都无法维持超过几分钟。这意味着在基于卫星的 CDN 中,维持一个稳定的客户端-服务器映射关系超过数分钟是不可能的。这种动态映射会带来两个影响:首先,服务于同一区域的多颗卫星可能需要冗余地缓存和提供相同的对象,从而降低缓存存储效率;其次,随着服务的地理区域发生变化,卫星需要不断更新其缓存的内容。
以图 4 中的一个例子来说明。用户1 和用户2 被卫星链路调度器分别调度选择卫星 D 和卫星 B 作为其首选接入卫星(图中所有卫星对两个用户均可见且可连接)
- 现在,用户1 发起一个请求,该请求被解析到邻近的卫星 D 上的边缘服务器
- 卫星 D 检查其本地缓存,如果内容可用则直接提供服务
- 否则,它会将请求下行传输至地面网络,以获取并缓存所请求的内容
那么,如果此时用户2 请求与用户1 相同的内容会发生什么?
同样的过程将会发生,卫星 B 上的缓存服务器也会获取并缓存相同的内容,而没有意识到这个请求本可以通过 ISL(卫星间链路)从邻近的卫星 D 的缓存中获取内容来满足
因此,在两颗卫星中对相同内容的冗余存储降低了系统的整体缓存存储效率,并且如果卫星 B 发生缓存未命中,用户2 可能会因额外的地面-卫星通信而感知到高延迟
核心要点: 动态的客户端-服务器映射可能导致同一内容在多个缓存中冗余存储,从而降低整体的缓存存储效率。此外,由于额外的地面-卫星通信,这还可能在缓存未命中时导致更高的延迟。
tl;dr
- 每个CDN缓存随时间推移, 为不同地理区域的用户提供服务
- 动态的 client-cdn mapping 可能导致同一内容在多个缓存中冗余存储; 如果 cache miss, 会导致latency更严重
3.2 StarCDN: Consistent Hashing to Reduce Redundant Caching¶
To counter the dynamic-client server relationships, we propose a consistent hashing scheme for partitioning cache content across satellites. Consistent hashing is widely used in production storage systems to provide load-balancing and fault tolerance [30, 36, 71]. Conceptually, consistent hashing involves hashing both servers and objects to a unit circle. Each object is mapped to the next server that appears clockwise on that unit circle. This allows the objects to be divided among servers on a single cluster, reducing redundancy. In terrestrial networks, consistent hashing is done across servers located close to each other within a cluster as described in [36].
We cannot apply consistent hashing to the satellite use case as is. Specifically, LEO satellites are typically small and do not support many servers running simultaneously under the limited power, size, and thermal constraints [7]. Moreover, even if we were to map objects to different servers on a satellite, it would not solve the redundant caching problem discussed before.
StarCDN proposes a satellite-specific variant of consistent hashing. Similar to standard consistent caching, we partition the objects into \(K\) disjoint buckets. Subsequently, each bucket is then mapped to a different satellite. But how do we identify which bucket should be mapped to which satellite? Formally, this problem can be mapped to a graph coloring problem [29] for an arbitrary constellation topology, with constraints imposed by the presence of ISLs and latency requirements to fetch content from different satellites.
The Starlink topology is shaped in a grid pattern, which simplifies the mapping problem. Inter-satellite links naturally lead to a grid topology of satellites. Each satellite typically connects to four other satellites (front, back, left, right). We map the \(K\) buckets on this grid in a repeating \(\sqrt{K} \times \sqrt{K}\) pattern. For example, we can map 9 buckets in a 3 x 3 pattern.
An example for \(K = 4\) is shown in Fig. 5a where the four buckets of content are stored in each 2x2 grid, for instance, the grid consisting of satellites S1, N1, S2, and N2 stores the \(K = 4\) distinct buckets. Note that we cannot control which user talks to which satellite. This falls within the purview of Starlink’s satellite link scheduler and has various constraints. However, when a user requests an object from its first-contact satellite (e.g., S1 in Fig. 5a), it can serve the object if the object is in its designated bucket. If not, it can compute the shortest path to a satellite with the object’s bucket (say N1) and forward the request along that path. If this satellite (N1) has the object (cache hit), it forwards the object to the first-contact satellite (S1). If the satellite does not have the required object (cache miss), it can request it from the ground and store it in its cache for the future and also forward the object to the first-contact satellite (S1). The first-contact satellite (S1), then, forwards the object to the user.
Our scheme has several advantages. First, it increases the cache storage efficiency since each cache needs only cache \((1/K)^{th}\) of the objects, allowing more objects to be stored in space and increasing the cache hit rate. Second, it reduces redundancy as each object is only stored by the server that is assigned its bucket. If \(K\) satellites serve a single region, they do not need to cache the same content. Third, although there is an added latency to query an object from a neighboring satellite, all buckets are accessible in at most \(2 \lfloor \frac{\sqrt{K}}{2} \rfloor\) hops from the first-contact satellite. Finally, our consistent hashing scheme accommodates any cache replacement scheme within each server, including LRU (Least Recently Used), LFU (Least Frequently Used), Sieve [74], and others.
How do we choose K? The choice of \(K\) is driven by three factors. First, a small value of \(K\) increases the cache miss rate since a greater fraction of content is assigned to each cache and the larger content redundancy that is entailed. Second, a large value of \(K\) increases the latency to access an object. Third, there are constraints due to the constellation size, orbit design, and orbital motion. We find that values of \(K = 4\) and 9 are generally compatible with the Starlink constellation. We evaluate this tradeoff in §5.3.
为应对动态的客户端-服务器关系,我们提出一种 一致性哈希 方案,用于在卫星间划分缓存内容。一致性哈希在生产存储系统中被广泛用于提供负载均衡和容错能力 [30, 36, 71]。从概念上讲, 一致性哈希涉及将服务器和对象都哈希到一个单位圆上。每个对象都被映射到该单位圆上沿顺时针方向出现的下一个服务器。这使得对象可以在单个集群的服务器之间进行划分 ,从而减少冗余。在地面网络中,一致性哈希是在一个集群内地理位置彼此邻近的服务器之间进行的,如文献 [36] 所述。
我们无法将现有的一致性哈希方案直接应用于卫星场景。具体而言,LEO 卫星通常较小,在有限的功率、尺寸和散热限制下,不支持多台服务器同时运行 [7]。此外,即使我们能将对象映射到一颗卫星上的不同服务器,也无法解决前述的冗余缓存问题。
StarCDN 提出了一种针对卫星的特定一致性哈希变体。与标准的一致性哈希类似,我们将对象划分为 \(K\) 个不相交的桶(bucket)。随后,每个桶被映射到一颗不同的卫星。但我们如何确定哪个桶应该映射到哪颗卫星呢?形式上,对于一个任意的星座拓扑结构,这个问题可以被映射为一个图着色问题 [29],其中 ISL(卫星间链路)的存在以及从不同卫星获取内容的延迟要求构成了约束条件。
星链(Starlink)的拓扑结构呈网格状,这简化了映射问题。卫星间链路自然地形成了一个网格拓扑。每颗卫星通常与其前后左右四颗卫星相连。我们在这个网格上以重复的 \(\sqrt{K} \times \sqrt{K}\) 模式来映射这 \(K\) 个桶。例如,我们可以将 9 个桶以 3x3 的模式进行映射。
图 5a 展示了一个 \(K=4\) 的示例,其中四个内容桶被存储在每个 2x2 的网格中。例如,由卫星 S1、N1、S2 和 N2 组成的网格存储了 \(K=4\) 个不同的桶。需要注意的是, 我们无法控制哪个用户与哪颗卫星通信,这属于星链的卫星链路调度器负责的范畴 ,并受到多种因素的制约
然而,当一个用户向其首选接入卫星(例如图 5a 中的 S1)请求一个对象时:
如果该对象属于分配给 S1 的桶,它就可以直接提供服务
如果不属于,S1 可以计算出到存储该对象所在桶的卫星(比如 N1)的最短路径,并沿该路径转发请求
如果目标卫星(N1)存有该对象(缓存命中),它会将对象转发回首选接入卫星(S1)
如果目标卫星没有所需对象(缓存未命中),它可以从地面请求该对象,将其存储在自己的缓存中以备将来使用,并同时将对象转发给首选接入卫星(S1)
最终,首选接入卫星(S1)将对象转发给用户
C++ | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
我们的方案有几个优点:
首先,它提高了缓存存储效率,因为每个缓存只需存储 \((1/K)\) 的对象,这使得太空中可以存储更多的对象,从而提高缓存命中率
其次,它减少了冗余,因为每个对象仅由被分配其对应桶的服务器存储。如果 \(K\) 颗卫星服务于同一区域,它们无需缓存相同的内容
第三,尽管从相邻卫星查询对象会增加一些延迟,但从首选接入卫星出发,访问所有桶最多只需 \(2 \lfloor \frac{\sqrt{K}}{2} \rfloor\) 跳(hop)
最后,我们的一致性哈希方案兼容每个服务器内部的任何缓存替换策略,包括 LRU(最近最少使用)、LFU(最不经常使用)、Sieve [74] 等
我们如何选择 K?
\(K\) 的选择由三个因素驱动。首先,较小的 \(K\) 值会增加缓存未命中率,因为更大比例的内容被分配给每个缓存,随之而来的内容冗余也更大。其次,较大的 \(K\) 值会增加访问一个对象的延迟。第三,星座规模、轨道设计和轨道运动也带来了限制。我们发现 \(K=4\) 和 \(K=9\) 的值通常与星链星座兼容。我们将在 §5.3 节中评估这一权衡。
3.3 StarCDN: Relayed Fetch to Counter Orbital Motion¶
Next, we discuss how StarCDN addresses the challenge of dynamic shifts in the access pattern. Recall that the fundamental reason for such a dynamic access pattern shift is the orbital motion of a satellite. Unlike traditional CDNs, where the edge server remains in a single location and serves users proximal to that location, a LEO satellite is in rapid motion with respect to the surface of the Earth. Intuitively, our goal is to create a flow of cached content in the opposite direction of the orbital motion. This allows the cached content accessed by users at a particular location to stay above that location, while the server that caches the content keeps changing due to the orbital motion. We call this technique relayed fetch, where we allow a satellite to request objects from the neighboring satellites with the same bucket mapped to them. For example, in Fig. 5a, N1 can request content from N3 on its right since it is its next nearest neighbor with the same color/bucket. Such relays are only initiated in response to a cache miss at N1.
One concern of using relayed fetch is the latency overhead. Each one-way inter-orbital hop requires at least 2 ms, while an intra-orbital hop needs 8 ms. Given the added latency of intra-orbital links and the larger distance between them, StarCDN fetches data only from inter-orbital neighbors, avoiding intra-orbital neighbors to mitigate the high latency penalty associated with a cache miss. To visualize inter-orbital links and the benefit of fetching from them, we visualize two satellite trajectories in Fig. 3. It shows the trajectory of two satellites three inter-orbit links away over one period. Note that the red satellite follows a path very similar to that of the green satellite (west inter-orbital neighbor of the red satellite) traveled in the previous footprint. This means a satellite’s west neighbor has the historical footprint of requests we want to exploit.
Finally, we make a slight addition to our design to also allow satellites to fetch from their east inter-orbital neighbor (e.g., between N1 and N3), since this has the same latency penalty as fetching from the west neighbor. Our evaluation in §5 shows that this connection is less likely to be useful compared to the rightward links, but it doesn’t incur any additional latency in StarCDN and hence, we choose to keep these links bidirectional.
Why not proactive prefetching? In our current design, we use relayed fetch in response to a cache miss. An alternative strategy is to proactively prefetch popular content from preceding satellites when entering a populous region. It can create a similar backflow of content to counter the effect of orbital motion. However, there is a risk of prefetching content that is stale and is no longer being requested by clients. We found this strategy to be less efficient than relayed fetch in terms of hit rate. While relayed fetch incurs an added latency, this latency only happens during the first request for a new object. Once the object has been fetched, it can be stored in the local cache. In contrast, if the proactively prefetched content is not used, it will be a waste of cache space at the receiver, a waste of power to transmit the data, and a waste of ISL bandwidth.
接下来,我们讨论 StarCDN 如何应对访问模式的动态变化这一挑战。回顾一下,这种动态访问模式转变的根本原因是卫星的轨道运动。
- 在传统的 CDN 中, 边缘服务器固定于单一地点,并为该地点附近的用户提供服务
- 与之不同, LEO 卫星相对于地球表面处于高速运动状态
直观地说,我们的目标是 在与轨道运动相反的方向上,创建一个缓存内容的流动
这样,当缓存内容的服务器因轨道运动而不断变化时,特定地理位置用户所访问的缓存内容仍能“停留”在该位置的上空。我们将这种技术称为中继式获取(relayed fetch),即我们允许一颗卫星从映射了相同桶(bucket)的相邻卫星那里请求对象。例如,在图 5a 中,N1 可以从其右侧的 N3 请求内容,因为 N3 是与它具有相同颜色/桶的下一个最近邻居。这种中继请求仅在 N1 发生缓存未命中时才会发起。
使用中继式获取的一个担忧是其延迟开销。每一次单向的跨轨道(inter-orbital) 跳跃至少需要 2 毫秒,而一次轨道内(intra-orbital) 跳跃则需要 8 毫秒。考虑到轨道内链路带来的额外延迟及其更远的距离,StarCDN 仅从跨轨道的邻居获取数据,避免从轨道内的邻居获取,以减轻因缓存未命中而导致的高昂延迟代价。
为了更直观地理解跨轨道链路及其带来的好处,我们在图 3 中展示了两颗卫星的轨迹。该图显示了相隔三个跨轨道链路的两颗卫星在一个周期内的轨迹。请注意,红色卫星的路径与绿色卫星(红色卫星的西部跨轨道邻居)在之前覆盖区所经过的路径非常相似。这意味着,一颗卫星的西部邻居拥有我们希望利用的历史请求足迹。
最后,我们对设计做一个微小的补充,即允许卫星也从其东部的跨轨道邻居(例如 N1 和 N3 之间)获取数据,因为这与从西部邻居获取数据具有相同的延迟代价。我们在第 5 节的评估表明,与向右(向西)的链路相比,这种连接(向东)可能不那么有用,但它并不会给 StarCDN 带来任何额外的延迟,因此我们选择保持这些链路为双向。
为何不采用主动式预取?
在我们当前的设计中,我们采用中继式获取来响应缓存未命中。
另一种策略是,在进入人口密集区域时,主动地从前序卫星预取流行内容:
这同样可以创建一种内容的回流,以抵消轨道运动的影响。然而,这种策略存在预取到过时内容的风险,这些内容可能客户端已不再请求。我们发现,在缓存命中率方面,该策略的效率低于中继式获取。尽管中继式获取会产生额外的延迟,但这种延迟仅在首次请求新对象时发生一次,一旦对象被获取,它就可以存储在本地缓存中。相比之下,如果主动预取的内容未被使用,那么对于接收方而言,这将造成缓存空间的浪费、数据传输的功率浪费以及 ISL 带宽的浪费。
3.4 StarCDN: Robustness to Unavailability¶
StarCDN also consistently monitors the nearby satellite links and the reachability of bucket IDs. ISLs today can last weeks [37] with high stability. However, link unavailability is inevitable when satellites commence maneuvers to avoid collisions [35]. Note that such collision avoidance maneuvers are known in advance, as the satellite network operator needs to plan for them. A different type of failure is cache server unavailability, e.g., due to bringing down the server for a software update. Cache server unavailability is also common but transient [71].
StarCDN’s response to such failures depends on whether the failure is transient or long-term (tens of minutes or hours). For transient failures, StarCDN simply reports a cache miss and forwards the requests to the ground. For the long-term failures, StarCDN’s consistent hashing scheme remaps the bucket assigned to the unavailable satellite to the next available satellite (this satellite is now responsible for multiple buckets). Our approach temporarily leads to uneven loads, but the time duration of such failures is small enough that a larger reconfiguration of the consistent hashing scheme isn’t needed. We evaluate our robustness to failures in §5.4.
此外,StarCDN 会持续监控邻近的卫星链路以及桶 ID(bucket ID)的可达性。如今的 ISL(卫星间链路)可以高度稳定地持续数周 [37]。然而,当卫星开始执行规避碰撞的机动动作时,链路不可用是不可避免的 [35]。值得注意的是,这类避碰机动是提前可知的,因为卫星网络运营商需要对其进行规划。另一种不同类型的故障是缓存服务器不可用,例如因软件更新而关闭服务器。缓存服务器不可用也很常见,但通常是瞬时性的 [71]。
StarCDN 对此类故障的响应取决于故障是瞬时性的还是长期性的 (持续数十分钟或数小时):
- 对于瞬时性故障,StarCDN 直接报告一次缓存未命中,并将请求转发至地面
- 对于长期性故障,StarCDN 通过其一致性哈希方案,将分配给不可用卫星的桶重新映射到下一个可用的卫星上(这颗卫星现在需要负责多个桶)
我们的方法会暂时导致负载不均,但由于此类故障的持续时间足够短,因此无需对一致性哈希方案进行更大规模的重新配置。我们将在 §5.4 节中评估我们方案的故障鲁棒性
SpaceGEN: Synthetic Trace Generator for Satellite-based CDNs¶
这一部分不用看了。知道仿真度很高即可
In this section, we describe our trace generation tool, SpaceGEN, that generates realistic synthetic traces of how LSN users access content. Our goal is to generate long-term synthetic traces suitable for simulating StarCDN and other satellite-based CDN designs from limited real-world traces collected from the production CDN. Our tool builds on the theory of footprint descriptors (FDs). FDs are traffic models that capture the manner in which users access content and were first proposed in [58]. FDs can predict cache hit rates for various traffic classes, such as videos, web, and downloads, and are used for cache provisioning in Akamai’s production CDN. Further, variants of FDs were used to generate synthetic logs that are provably similar to production logs and incorporated in the synthetic trace generation tools TRAGEN [44] and JEDI [45]. Unlike prior tools that generate synthetic traces for an individual cache at a specific location, SpaceGEN generates synthetic traces collectively across multiple locations that are suitable for simulating satellite-based CDNs. We further show that the synthetic trace and the production trace yield similar hit rates when performing a cache simulation of a satellite traversing these locations. We have made this tool and the associated traffic models derived from the production CDN available to the research community to facilitate further research in the satellite-based CDN domain.
在本节中,我们介绍我们的轨迹生成工具 SpaceGEN,该工具可以生成 LSN(低轨卫星网络)用户访问内容的真实合成轨迹。我们的目标是,利用从真实业务 CDN 中收集的有限真实世界轨迹,生成适用于模拟 StarCDN 及其他星基 CDN 设计的长期合成轨迹。我们的工具建立在足迹描述符(Footprint Descriptors, FDs)的理论之上。FDs 是一种流量模型,能够捕捉用户访问内容的方式,最早在 [58] 中被提出。FDs 能够预测视频、网页和下载等各类流量的缓存命中率,并被用于 Akamai 生产环境 CDN 的缓存资源分配。此外,FDs 的变体也被用于生成可证明与生产日志相似的合成日志,并被集成到合成轨迹生成工具 TRAGEN [44] 和 JEDI [45] 中。与以往那些为特定地理位置的单个缓存生成合成轨迹的工具不同,SpaceGEN 能够跨多个地点统一生成适用于模拟星基 CDN 的合成轨迹。我们进一步证明,在模拟卫星穿越这些地点的缓存场景时,合成轨迹与生产轨迹能够产生相似的命中率。我们已将此工具及从生产环境 CDN 中提取的相关流量模型开放给研究社区,以促进星基 CDN 领域的进一步研究。
4.1 Traffic Models for Satellite-based CDNs¶
Our models capture essential statistics such as popularity, size, overlap, and access patterns of the objects requested by LSN users using the following traffic models: (i) Global Popularity Distribution (GPD), and (ii) Popularity-Size Footprint Descriptor (pFD). The GPD captures the correlation between objects and requests across different locations, while the pFD describes the access patterns of objects from a single location. We compute the GPD and pFDs models from the production traces of the Akamai CDN. We have made these models available for public download.
Global Popularity Distribution (GPD). The GPD captures the joint distribution of an object’s popularity and size across multiple locations. Formally, it is expressed as 𝑃( 𝑝 1 , . . . , 𝑝 𝑛 ,𝑧), where 𝑝 𝑖 represents the popularity of the object at the 𝑖 𝑡ℎ location, and 𝑧 denotes the size of the object in bytes. Popularity 𝑝 𝑖 is defined as the number of requests made for the object in the production trace at location 𝑖. Unlike a normalized value, this definition of popularity is adopted by synthetic trace generation tools such as TRAGEN and JEDI. Thus, 𝑃( 𝑝 1 , . . . , 𝑝 𝑛 ,𝑧) represents the probability that an object has size 𝑧 and popularity 𝑝 𝑖 at location 𝑖.
Popularity-Size Footprint Descriptor (pFD). The pFD captures the request access patterns, popularity, and size of objects requested from a single location. It has been shown to capture: (i) Object-level properties: including the popularity distribution, size distribution, and request-size distribution, and (ii) Cache-level properties: such as request hit rate curves and byte hit rate curves of the trace. Formally, pFD is described as a probability distribution 𝑃( 𝑝 ,𝑧,𝑠,𝑡), where: (i) 𝑝 and 𝑧 represent the popularity and size of an object in the trace (ii) 𝑠 denotes the number of unique bytes requested between consecutive accesses of an object (iii) 𝑡 represents the inter-arrival time, i.e., the time between consecutive accesses of an object. The number of unique bytes, 𝑠, that are requested between consecutive accesses is known as the stack distance [45].
我们的模型通过以下流量模型来捕捉 LSN 用户请求对象的基本统计特征,如流行度、大小、重叠度以及访问模式:(i) 全局流行度分布 (Global Popularity Distribution, GPD),以及 (ii) 流行度-大小足迹描述符 (Popularity-Size Footprint Descriptor, pFD)。GPD 捕捉了不同地点间对象与请求的相关性,而 pFD 则描述了单个地点内对象的访问模式。我们基于 Akamai CDN 的生产轨迹来计算 GPD 和 pFDs 模型,并已将这些模型公开提供下载。
全局流行度分布 (GPD)。 GPD 捕捉了单个对象的流行度与大小在多个地点间的联合分布。其形式化表达为 \(P(p_1, ..., p_n, z)\),其中 \(p_i\) 代表对象在第 \(i\) 个地点的流行度,而 \(z\) 表示对象的大小(单位:字节)。流行度 \(p_i\) 被定义为该对象在地点 \(i\) 的生产轨迹中被请求的次数。与归一化值不同,这种流行度的定义被 TRAGEN 和 JEDI 等合成轨迹生成工具所采用。因此,\(P(p_1, ..., p_n, z)\) 表示一个对象的大小为 \(z\) 且在地点 \(i\) 的流行度为 \(p_i\) 的概率。
流行度-大小足迹描述符 (pFD)。 pFD 捕捉了从单个地点请求的对象的访问模式、流行度和大小。它已被证明能够捕捉:(i) 对象级属性:包括流行度分布、大小分布和请求-大小分布;以及 (ii) 缓存级属性:例如轨迹的请求命中率曲线和字节命中率曲线。形式上,pFD 被描述为一个概率分布 \(P(p, z, s, t)\),其中:(i) \(p\) 和 \(z\) 分别代表轨迹中一个对象的流行度和大小;(ii) \(s\) 表示一个对象的两次连续访问之间被请求的独立字节数;(iii) \(t\) 表示到达间隔时间,即一个对象的两次连续访问之间的时间间隔。两次连续访问之间请求的独立字节数 \(s\) 被称为栈距离 (stack distance) [45]。
4.2 Trace Generation Algorithm¶
In this section, we present the synthetic trace generation algorithm used by SpaceGEN. The algorithm takes as input the GPD and the \(n\) pFDs derived from the production trace of each location. It generates \(n\) synthetic traces of user-specified lengths, with each trace corresponding to a specific location.
Initialization phase. We initialize an empty cache \(C_i\) corresponding to each location \(i \in n\). To fill the caches, we iteratively create objects and assign them popularities and a size by sampling from the GPD. For each object \(o\), the sample gives the popularity vector \(p\) and a size \(z\) where \(p_i\) denotes the popularity of the object in the \(i^{th}\) location. If \(p_i > 0\), we add \(o\) to \(C_i\). We repeat till each cache \(C_i\), is at least as large as the maximum stack distance in the pFD of the \(i^{th}\) location.
Generation phase. We generate the \(n\) synthetic traces, each corresponding to a location \(i \in n\), by the following procedure. First, for each location \(i\), we compute \(P_i(s|p, z)\) for all possible values of \(p\) and \(z\) from the \(i^{th}\) pFD. During each iteration of the trace generation phase, we examine the object at the top of the cache \(C_i\). We add a request to the object in the synthetic trace. Let the object's popularity be \(p\) and its size be \(z\). We then sample a stack distance \(s\) from \(p_i(s|p, z)\). If the object has already received \(p\) requests in the synthetic trace, it is removed from the cache. Otherwise, it is removed from the top of the cache and reinserted within \(C_i\) at a stack distance \(s\) from the top. Finally, we assign timestamps to the traces based on either the average data rate derived from the pFD or a more fine-grained data rate computed from the real traces. We describe the algorithm in detail in Algorithm 1 in Appendix A.1.
在本节中,我们介绍 SpaceGEN 使用的合成轨迹生成算法。该算法的输入是 GPD 以及从每个地点的生产轨迹中提取的 \(n\) 个 pFDs。它会生成 \(n\) 条用户指定长度的合成轨迹,每条轨迹对应一个特定的地理位置。
初始化阶段。 我们为每个地点 \(i \in n\) 初始化一个空缓存 \(C_i\)。为了填充缓存,我们通过从 GPD 中采样来迭代地创建对象,并为其分配流行度和大小。对于每个对象 \(o\),采样会给出一个流行度向量 \(p\) 和一个大小 \(z\),其中 \(p_i\) 表示该对象在第 \(i\) 个地点的流行度。如果 \(p_i > 0\),我们就将 \(o\) 添加到 \(C_i\) 中。我们重复此过程,直到每个缓存 \(C_i\) 的大小至少达到第 \(i\) 个地点的 pFD 中的最大栈距离。
生成阶段。 我们通过以下流程生成 \(n\) 条合成轨迹,每条对应一个地点 \(i \in n\)。首先,对于每个地点 \(i\),我们根据其第 \(i\) 个 pFD 计算出所有可能的 \(p\) 和 \(z\) 值所对应的 \(P_i(s|p, z)\)。在轨迹生成阶段的每次迭代中,我们检查缓存 \(C_i\) 顶部的对象,并将一个对该对象的请求添加到合成轨迹中。设该对象的流行度为 \(p\),大小为 \(z\)。然后我们从 \(p_i(s|p, z)\) 中采样一个栈距离 \(s\)。如果该对象在合成轨迹中被请求的次数已达到 \(p\) 次,它将被从缓存中移除。否则,它将从缓存顶部移除,并在距离缓存顶部为 \(s\) 的栈距离处重新插入 \(C_i\) 中。最后,我们基于从 pFD 推导出的平均数据速率或从真实轨迹计算出的更细粒度的数据速率,为轨迹分配时间戳。我们在附录 A.1 的算法 1 中详细描述了该算法。
4.3 Properties of the Synthetic Trace¶
We will now show that the synthetic trace generated by Algorithm 1 is similar to the production trace. The object spread distribution and the traffic spread distribution of the synthetic and production traces are shown in Fig. 6a and Fig. 6b, respectively. Here, the object spread is the number of locations an object is accessed from and we observe that both the traces have similar object spreads. The traffic spread is the object spread weighted by the size and the number of requests made to the object. We observe that the traffic spread of the production and synthetic traces are similar.
In Fig. 6c (resp., Fig. 6d), we show that the request hit rates (resp., byte hit rate) of a cache simulation of a traditional CDN server using LRU yield similar results for the synthetic and the production trace. In particular, we observe an average difference of 0.4% in request hit rate (resp., 0.3% in byte hit rate) across all the cache sizes we simulated. Next, we simulate satellites in motion that are equipped with an LRU cache. In this case, we observe an average difference of 2% in request hit rate (resp., 1% in byte hit rate) between the synthetic and production traces across all cache sizes we simulate. We observe similar results when we simulate the StarCDN-Fetch architecture. Results are in Fig. 13c and Fig. 13d in Appendix A.1. Thus, we conclude that the synthetic traces can be used in lieu of the production traces for our evaluation.
现在我们将证明,由算法 1 生成的合成轨迹与生产轨迹是相似的。合成轨迹与生产轨迹的对象分布广度(object spread)和流量分布广度(traffic spread)分别如图 6a 和图 6b 所示。在这里,对象分布广度指的是一个对象被访问的地点数量,我们观察到两条轨迹具有相似的对象分布广度。流量分布广度是对象分布广度根据对象大小和请求次数加权后的结果。我们观察到生产轨迹和合成轨迹的流量分布广度是相似的。
在图 6c(及图 6d)中,我们展示了在使用 LRU 策略对传统 CDN 服务器进行缓存模拟时,合成轨迹和生产轨迹的请求命中率(及字节命中率)得出了相似的结果。具体而言,在我们模拟的所有缓存大小下,请求命中率的平均差异为 0.4%(字节命中率为 0.3%)。接下来,我们模拟了配备 LRU 缓存的运动中卫星。在这种情况下,我们观察到在我们模拟的所有缓存大小下,合成轨迹和生产轨迹之间的请求命中率平均差异为 2%(字节命中率为 1%)。当我们模拟 StarCDN-Fetch 架构时,也观察到了类似的结果,相关结果在附录 A.1 的图 13c 和图 13d 中。
因此,我们得出结论,合成轨迹可以在我们的评估中替代生产轨迹使用
Empirical Evaluation¶
We discuss our evaluation of StarCDN below.
5.1 Experimental Setup¶
CDN traces: To evaluate StarCDN over a sufficiently long period, we create 5-days-long synthetic traces using the SpaceGEN trace generator discussed in §4. SpaceGEN generates synthetic traces from traffic models derived from real-world Akamai production traces for the video traffic class described in §3.1. In total the synthetic traces for the video traffic class have 2 billion requests and 2.5PB content traffic. In addition, in §5.5, we also extend the evaluation to the web and download traffic classes by using SpaceGEN with the relevant traffic models for these classes.
Simulation setup: We collected up-to-date TLE data from CelesTrak [12] for Starlink-53-Gen-1 satellites’ orbital information and orbital shell information from Starlink.sx[39]. Even though Starlink Gen1 satellites do not support ISLs, the Starlink Gen2 satellite constellation is still in the launch phase and not fully operational. Thus, we cannot obtain full orbital information from the Starlink Gen2 constellation. Instead, we use Starlink-53-Gen-1 as a representative of Starlink’s constellation topology. We infer the ISLs to both inter-orbital and intra-orbital neighbors using the shell information. If the neighbor is out of service or out of slot, we assume the link cannot be established. We simulate 1,170 satellites in 72 orbits inclined at 53 degrees.
CDN 轨迹: 为了在足够长的时间内评估 StarCDN,我们使用第 4 节中讨论的 SpaceGEN 轨迹生成器创建了为期 5 天的合成轨迹。SpaceGEN 基于第 3.1 节中描述的、从真实的 Akamai 视频流量生产轨迹中提取的流量模型来生成合成轨迹。视频流量类别的合成轨迹总共包含 20 亿次请求和 2.5PB 的内容流量。此外,在第 5.5 节中,我们还通过将 SpaceGEN 与网页和下载流量类别的相关模型相结合,将评估扩展到这两个类别。
模拟设置: 我们从 CelesTrak [12] 收集了 Starlink-53-Gen-1 卫星的最新 TLE (两行轨道根数) 数据,并从 Starlink.sx [39] 获取了其轨道壳层信息。尽管星链 Gen1 卫星不支持 ISL (卫星间链路),但星链 Gen2 卫星星座仍处于发射阶段,尚未完全投入运营,因此我们无法获取其完整的轨道信息。作为替代,我们使用 Starlink-53-Gen-1 作为星链星座拓扑的代表。我们利用壳层信息来推断与跨轨道和轨道内邻居卫星的 ISL。如果邻居卫星停止服务或不在槽位,我们则假设链路无法建立。我们模拟了 1,170 颗卫星,分布在 72 条轨道上,轨道倾角为 53 度。
We implemented a trace-driven simulator composed of Microsoft’s CosmicBeats [38, 48] simulator for the orbital motion, client link scheduling, and a multi-process cache replayer to mimic real-world asynchronous CDN accesses. For each CDN user node in CosmicBeats, we associate a synthetic trace from SpaceGEN that represents user requests for content from that geographic location. CosmicBeats determines the satellites available in view at each location and splits all requests within the discrete time step to different satellites. The time step of CosmicBeats is set to 15 seconds, aligned with the Starlink global scheduler’s reconfigure interval [51]. CosmicBeats outputs logs of object access for every satellite, which are loaded into our cache replayer. The cache replayer spawns a process for each satellite that uses TCP to mimic ISLs. Finally, the replayer orchestrates the cache replay and allows satellite processes to simulate the real-world request traffic. We open-sourced the simulation framework, including the configuration files for running CosmicBeats. User can also generate their own configuration files for their dataset and new application logic.
Baselines: We compare StarCDN with two baselines. (a) Naive LRU places LRU caches on LEO satellites (as proposed in past work [7, 8]). We evaluate this baseline using the same simulation framework. (b) Static Cache is an idealized baseline, i.e., it assumes that there is no orbital motion and satellites stay static. The usersatellite mapping is static. This baseline is, in practice, unachievable. Yet, we plot this baseline as the north star for satellite-based CDNs.
我们实现了一个轨迹驱动的模拟器,该模拟器由微软的 CosmicBeats [38, 48] 模拟器和一个多进程缓存回放器组成,前者用于模拟轨道运动和客户端链路调度,后者用于模拟真实的异步 CDN 访问。对于 CosmicBeats 中的每个 CDN 用户节点,我们都关联了一条来自 SpaceGEN 的合成轨迹,用以代表该地理位置的用户内容请求。CosmicBeats 决定每个位置可见的卫星,并将离散时间步长内的所有请求分配给不同的卫星。CosmicBeats 的时间步长设置为 15 秒,与星链全局调度器的重配间隔 [51] 保持一致。CosmicBeats 会输出每颗卫星的对象访问日志,这些日志被加载到我们的缓存回放器中。缓存回放器为每颗卫星生成一个进程,并使用 TCP 协议来模拟 ISL。最后,回放器协调整个缓存回放过程,使卫星进程能够模拟真实的请求流量。我们开源了该仿真框架,包括运行 CosmicBeats 的配置文件。用户也可以为自己的数据集和新的应用逻辑生成自己的配置文件。
基准方案: 我们将 StarCDN 与两个基准方案进行比较:
(a) 朴素 LRU (Naive LRU):该方案在 LEO 卫星上部署 LRU 缓存(如过去的工作 [7, 8] 所提议)。我们使用相同的仿真框架来评估此基准
(b) 静态缓存 (Static Cache):这是一个理想化的基准方案,即它假设没有轨道运动,卫星保持静止,且用户-卫星的映射关系是静态的。这个基准在实践中是无法实现的。然而,我们将其作为星基 CDN 的理想上限 (north star) 进行展示
5.2 ~ 5.5¶
tldr
Related Work¶
6.1 LEO Satellite Networks¶
LSNs have been greatly studied in the last decade with the emergence of services like Starlink, Oneweb and Kuiper. These networks are inherently dynamic and bandwidth-constrained due to satellite speeds [65, 68], frequent satellite handovers, and weather-induced variations [33]. Consequently, many works [28, 61, 62, 66] have measured and studied performance optimization on different layers of the networking stack.
Our work focuses on the broader domain of in-orbit computing. Past work, such as [7, 19, 64], has first proposed the idea of in-orbit computing in satellite networks for multiple applications such as CDNs, analysis of satellite imagery, etc. These designs leverage the emerging storage and compute capabilities in small LEO satellites. Recently, [8, 27] conducted extensive measurements to identify the shortcomings of terrestrial CDN providers in the context of LSNs (discussed in §1). Our work goes deeper into CDN architecture and design than past work in this space. We collect real-world traffic traces from multiple locations on the globe. Unlike past work, we propose a specific architecture for satellite-based CDNs, design new techniques, and evaluate these techniques using real-world traffic traces. In addition, we design a synthetic traffic generator that can be used to build new satellite-based caching systems.
随着星链(Starlink)、Oneweb 和 Kuiper 等服务的出现,LSN(低轨卫星网络)在过去十年中得到了广泛研究。由于卫星的高速运动 [65, 68]、频繁的卫星切换以及天气引起的变化 [33],这些网络本质上是动态且带宽受限的。因此,许多工作 [28, 61, 62, 66] 对网络协议栈不同层面的性能优化进行了测量和研究。
我们的工作聚焦于更广泛的在轨计算领域。过去的工作,如 [7, 19, 64],首次提出了在卫星网络中进行在轨计算以支持多种应用(如 CDN、卫星图像分析等)的想法。这些设计利用了小型 LEO 卫星中新兴的存储和计算能力。最近,[8, 27] 进行了广泛的测量,以识别地面 CDN 提供商在 LSN 环境下的不足之处(已在第 1 节中讨论)。与该领域的以往工作相比,我们的工作更深入地探讨了 CDN 的架构与设计。我们从全球多个地点收集了真实的流量轨迹。与以往的工作不同,我们为星基 CDN 提出了一个具体的架构,设计了新的技术,并使用真实的流量轨迹对这些技术进行了评估。此外,我们设计了一个合成流量生成器,可用于构建新的星基缓存系统。
6.2 CDNs and Caching Policies¶
There is extensive work in caching algorithms [3, 4, 6, 14, 31, 75] which make caching decisions based on various factors like popularity, object size, geography, time, cost, etc. [6, 13, 46, 57] to improve cache performance [5, 49, 58]. CDN’s infrastructure has also been extensively studied. Leading companies like Cloudflare, Google, and Akamai publish their infrastructure designs in detail[17, 23, 24, 36]. Our work is orthogonal to these lines of work. We focus on the architecture of CDNs in LSNs and design new techniques to counter orbital motion, which is unique to LSNs. We choose LRU as our eviction algorithm of choice due to its simplicity and use in practical deployments. Besides eviction policies, we focus on key aspects of content fetching that are uniquely helpful in an LSN context. While our work on synthetic trace generation is inspired by previous work [44, 45], we are the first to model the geographic diversity across multiple locations in synthetic traces, as described in §4.
关于缓存算法已有大量工作 [3, 4, 6, 14, 31, 75],这些算法根据流行度、对象大小、地理位置、时间、成本等多种因素 [6, 13, 46, 57] 做出缓存决策,以提高缓存性能 [5, 49, 58]。CDN 的基础设施也得到了广泛的研究。Cloudflare、谷歌和 Akamai 等领先公司都详细公布了其基础设施设计 [17, 23, 24, 36]。我们的工作与这些研究方向是正交的。我们专注于 LSN 中的 CDN 架构,并设计了新技术来应对轨道运动这一 LSN 特有的问题。我们选择 LRU 作为首选的淘汰算法,因其简单且在实际部署中被广泛使用。除了淘汰策略,我们还关注了内容获取的关键方面,这些方面在 LSN 环境中具有独特的帮助。虽然我们关于合成轨迹生成的工作受到了先前工作 [44, 45] 的启发,但我们是第一个在合成轨迹中对跨多个地点的地理多样性进行建模的,具体如第 4 节所述。
Limitations and Future Work¶
In this section, we discuss some limitations in our simulation frameworks and trace generator, and future work in trace generation and satellite-based CDNs.
Cross-location Temporal Traffic Correlation: One limitation of our trace generator and traffic model is that it does not fully capture some subtle temporal correlations that are possible between different locations. For instance, a hot news item on CNN may become popular a few hours later in California than in Boston, due to the time difference. However, as we showed earlier, the hit rate gap between production and synthetic traces for satellite-based caches is very small, providing evidence that these correlations have a limited effect on cache performance.
Discrete-time Simulator: Due to the inherent limitations of the discrete-time simulator, our current simulation framework does not model disconnections during object transfer. When we replay the cache trace at each satellite, it will only yield a cache hit or a cache miss. A Starlink satellite triggers a handover every few minutes, thus incurs a potential transmission failure. Capturing this kind of behavior requires a complicated simulator. We left a more fine-grained LEO satellite simulator with link-layer simulation capability and the impact on critical transmission failure as a future work direction.
Constellation-CDN Co-design: In designing StarCDN, we model the Starlink network as it is deployed, based on publicly available information. However, there can be an alternate design which jointly optimizes constellation design and caching strategies, e.g., by modifying ISL topologies dynamically or changing ground-satellite mappings in response to cache misses. We do not explore such designs because constellation design is governed by multiple factors, such as regulation, feasibility, launch costs, etc. Therefore, we opt for independent designs for constellation and caching.
New Applications: Orbital motion of LEO satellites creates challenges for multiple layers of the stack. With the emergence of direct-to-cell services, we envision that maintaining state for users in a geographic region, as the underlying containers of the data move, will emerge as a challenge for satellite-based cell services. We believe this is an important area of future work and will require even more stringent latency guarantees.
Security and DNS: TLS and relevant encryption technologies are crucial for modern network applications. As described in [42], edge servers host cryptographic keys used for terminating TLS connections with a key management infrastructure (KMI). Similarly, a fast, efficient DNS infrastructure to resolve a client to the firstcontact satellite also plays a vital role in the actual deployment of the system. We believe these are important topics for future work.
Co-optimizing CDNs and LSNs: An intermediate design between today’s CDNs and StarCDN could be to place edge servers colocated with ground stations. While this design can be implemented today and improve QoE for users, it may not significantly reduce ground-satellite network utilization and user-perceived latency. However, jointly optimizing the traffic routing and content caching decisions of LSNs and terrestrial CDNs is worth exploring from both a performance and cost perspective.
在本节中,我们讨论我们的仿真框架和轨迹生成器的一些局限性,以及在轨迹生成和星基 CDN 领域的未来工作。
跨地域的时间维度流量相关性: 我们的轨迹生成器和流量模型的一个局限性是,它没有完全捕捉到不同地点之间可能存在的一些微妙的时间相关性。例如,由于时差,CNN 上的一个热点新闻在加州可能比在波士顿晚几个小时才变得流行。然而,正如我们前面所展示的,生产轨迹和合成轨迹在星基缓存上的命中率差距非常小,这证明这些相关性对缓存性能的影响有限。
离散时间模拟器: 由于离散时间模拟器的内在限制,我们目前的仿真框架没有对对象传输过程中的连接中断进行建模。当我们在每颗卫星上回放缓存轨迹时,只会产生缓存命中或缓存未命中的结果。星链卫星每隔几分钟就会触发一次切换,因此存在潜在的传输失败风险。捕捉这种行为需要一个复杂的模拟器。我们将一个具备链路层模拟能力、能反映关键传输失败影响的更细粒度的 LEO 卫星模拟器作为未来的工作方向。
星座与 CDN 的协同设计: 在设计 StarCDN 时,我们是根据公开信息对已部署的星链网络进行建模的。然而,可能存在一种替代设计,即联合优化星座设计和缓存策略,例如,通过动态修改 ISL 拓扑或根据缓存未命中情况来改变地-星映射关系。我们没有探索这类设计,因为星座设计受多种因素制约,如法规、可行性、发射成本等。因此,我们选择了星座与缓存的独立设计方案。
新应用: LEO 卫星的轨道运动为协议栈的多个层面带来了挑战。随着直连蜂窝服务的出现,我们预见到,当承载数据的底层容器(卫星)移动时,如何为一个地理区域内的用户维护状态将成为星基蜂窝服务的一个挑战。我们认为这是一个重要的未来工作领域,并且将需要更严格的延迟保证。
安全性与 DNS: TLS 及相关加密技术对现代网络应用至关重要。如 [42] 所述,边缘服务器托管着用于终止 TLS 连接的加密密钥,并配有密钥管理基础设施 (KMI)。同样,一个快速、高效的 DNS 基础设施,用于将客户端解析到首选接入卫星,也在系统的实际部署中扮演着至关重要的角色。我们认为这些都是未来工作的重要课题。
CDN 与 LSN 的协同优化: 在当今的 CDN 和 StarCDN 之间,一种中间设计方案可能是将边缘服务器与地面站部署在同一位置。虽然这种设计目前就可以实现,并能改善用户的 QoE,但它可能无法显著降低地-星网络利用率和用户感知延迟。然而,从性能和成本的角度来看,联合优化 LSN 和地面 CDN 的流量路由与内容缓存决策是值得探索的。
Conclusion¶
We present StarCDN, a novel satellite-based CDN architecture that enables content to be cached efficiently in edge servers deployed in space. StarCDN implements mechanisms for placing and fetching content in satellite-based caches to reduce user-perceived latency and optimize the utilization of ground-to-satellite links. We also develop the first open-source trace generator, SpaceGEN, that can generate realistic synthetic traces of content access from a globally distributed set of users. We hope this tool and our dataset can promote future satellite-based CDN and caching research.
我们提出了 StarCDN,一种新颖的星基 CDN 架构,它能够将内容高效地缓存在部署于太空的边缘服务器中。StarCDN 实现了在星基缓存中放置和获取内容的机制,以降低用户感知延迟并优化地-星链路的利用率。我们还开发了首个开源的轨迹生成器 SpaceGEN,它可以从全球分布的用户群体中生成真实的内容访问合成轨迹。我们希望这个工具和我们的数据集能够推动未来星基 CDN 和缓存领域的研究。