Related Work¶
Sky Computing. We are not the first to use the name “Sky Computing” as several papers, dating back to 2009, also used this term [62, 69, 70]. However, these papers focus on particular technical solutions, such as running middleware (e.g., Nimbus) on a cross-cloud Infrastructure-as-a-Service platform, and target specific workloads such as high-performance computing (HPC). This paper takes a broader view of Sky Computing, seeing it as a change in the overall ecosystem and considering how technical trends and the market forces can play a critical role in the emergence of Sky Computing.
Sky Computing。我们并不是第一个使用“Sky Computing”这一术语的人,早在2009年就有几篇论文使用了这个术语 [62, 69, 70]。然而,这些论文侧重于特定的技术解决方案,比如在跨云基础设施即服务(IaaS)平台上运行中间件(例如 Nimbus),并且针对特定的工作负载,如高性能计算(HPC)。本文对Sky Computing持更广泛的看法,认为它是整个生态系统的变革,考虑了技术趋势和市场力量如何在Sky Computing的兴起中发挥关键作用。
The work most closely related to this paper is [81], but here we significantly extend that work by refining the vision, designing and building a broker, demonstrating its benefits in several applications, and reporting on early adoption.
与本文最相关的工作是 [81],但在此基础上,我们显著扩展了这一工作,完善了愿景,设计并构建了一个代理,展示了其在多个应用中的优势,并报告了早期的采用情况。
Review: XaaS
根据提供的信息,IaaS、PaaS和FaaS是三种不同的云计算服务模型:
-
Infrastructure-as-a-Service (IaaS):
- 提供基础的IT基础设施资源,如计算、存储、网络等
- 用户可以通过互联网访问这些虚拟化的计算资源
- 用户负责管理操作系统、存储和部署的应用程序
- 例如: Amazon EC2, Google Compute Engine
-
Platform-as-a-Service (PaaS):
- 提供完整的应用开发和部署环境
- 包括IaaS的基础设施,还提供中间件、开发工具、数据库管理等
- 用户可以在此平台上开发、运行和管理应用程序,无需管理底层基础设施
- 例如: Google App Engine, Heroku
-
Function-as-a-Service (FaaS):
- 无服务器计算的一种形式
- 开发者只需编写和上传功能代码(函数),无需管理任何服务器
- 函数在事件触发时自动执行,按实际使用量计费
- 适合事件驱动的应用场景
- 例如: AWS Lambda, Google Cloud Functions
主要区别在于抽象级别和用户管理的范围:
- IaaS提供最基础的资源,用户管理范围最广
- PaaS提供开发平台,简化了应用开发和部署
- FaaS更进一步简化,用户只需关注单个功能的实现
Cross-cloud compute, storage, and egress. Supercloud [65] is a virtual cloud that can span multiple zones and clouds, using nested virtualization and live VM migration to move stateful workloads across locations. Our proposal shares the goal of easing workload migration, but supports migrating higher-level jobs (not VMs), considers a broader set of cloud services in addition to IaaS, and focuses on batch jobs by optimizing for price, performance, and availability.
跨云计算、存储和出口流量。Supercloud [65] 是一个虚拟云,可以跨多个区域和云运行,使用嵌套虚拟化和实时虚拟机迁移将有状态的工作负载迁移到不同位置。我们的提案与Supercloud共享简化工作负载迁移的目标,但我们支持迁移更高层次的作业(而非虚拟机),并且除了IaaS之外还考虑了更广泛的云服务集,重点优化批处理作业的价格、性能和可用性。
There have been several proposals for cross-cloud storage solutions. CosTLO [91] and SPANStore [90] use request redundancy and replication to minimize storage access latencies. Perhaps the most comprehensive is Gaia-X, a European effort to create a federated open data infrastructure that enables data sharing with strong governance properties and respecting data and cloud sovereignty [28]. These efforts are largely orthogonal to our focus on computational tasks.
已经有多个关于跨云存储解决方案的提案。CosTLO [91] 和 SPANStore [90] 通过请求冗余和数据复制来最大限度地减少存储访问延迟。或许最全面的是 Gaia-X,这是欧洲的一项旨在创建联邦开放数据基础设施的努力,旨在通过强有力的治理机制实现数据共享,并尊重数据和云的主权 [28]。这些工作主要与我们专注于计算任务的重点相辅相成。
Several industry efforts have been started to reduce crosscloud data egress fees. The Bandwidth Alliance [19] is one such effort, consisting of several cloud providers who agree to reduce or even eliminate egress fees from their clouds to Cloudfare or other members. Closely related is Cloudfare R2 [24], an object store that promises to charge zero egress fees. Naturally, Sky Computing benefits from these efforts to combat data gravity, and the intercloud broker can be extended to support zero-egress storage systems.
一些行业已经开始努力减少跨云数据出口费用。Bandwidth Alliance [19] 是其中之一,成员包括多个云服务提供商,他们同意减少甚至消除从其云到Cloudflare或其他成员的出口费用。与之密切相关的是 Cloudflare R2 [24],这是一种承诺不收取出口费用的对象存储系统。Sky Computing 自然也受益于这些为应对数据引力而做出的努力,且跨云代理可以扩展支持零出口费的存储系统。
Middleware. Middleware solutions (e.g., CORBA [25], Microsoft BizTalk [37], IBM WebSphere [34], etc.) bear some resemblance to our work. While these solutions allow systems from different vendors to communicate and interoperate, our proposal allows an application to utilize cloud services offered by different cloud providers.
中间件。中间件解决方案(例如 CORBA [25]、Microsoft BizTalk [37]、IBM WebSphere [34] 等)与我们的工作有些相似。虽然这些解决方案允许来自不同供应商的系统进行通信和互操作,但我们的提案是允许一个应用程序利用不同云服务提供商提供的云服务。
There are several differences between these efforts and the intercloud broker. First, we consider satisfying requirements such as minimizing costs which have not been a concern of these systems. Second, the intercloud broker focuses on placing the components of the same application rather than on how systems from different vendors interoperate. Finally, we are operating in a cloud setting rather than a traditional distributed system setting.
这些努力与跨云代理之间存在一些差异。首先,我们考虑满足诸如降低成本等需求,而这些系统并不关注此类问题。其次,跨云代理专注于将同一应用程序的组件放置在不同的云上,而不是关注来自不同供应商的系统如何互操作。最后,我们是在云环境中运行,而不是传统的分布式系统环境。
Differences aside, middleware solutions that allow cloud services to interoperate (e.g., connect an AWS S3 bucket with GCP Dataproc) could be considered as being part of the compatibility set, which the intercloud broker can leverage.
尽管有这些差异,允许云服务互操作的中间件解决方案(例如连接 AWS S3 存储桶与 GCP Dataproc)可以被视为兼容性集的一部分,跨云代理可以利用这一点。
Integration Platform-as-a-Service (iPaaS). Like the middleware systems discussed above, iPaaS solutions [40, 47] also integrate distinct systems but are often run as managed services on the cloud. iPaaS solutions provide adaptors to connect APIs from different services and systems (e.g., APIs for Snowflake, Jira, or Stripe). Developers can build workflows on top (e.g., on receiving a new case in Salesforce, call Jira’s API to open a ticket) and deploy them through the iPaaS.
集成平台即服务(iPaaS)。与前面讨论的中间件系统类似,iPaaS 解决方案 [40, 47] 也整合了不同的系统,但通常作为云上的托管服务运行。iPaaS 解决方案提供适配器,用于连接不同服务和系统的 API(例如 Snowflake、Jira 或 Stripe 的 API)。开发人员可以在此基础上构建工作流(例如,当在 Salesforce 中收到一个新案例时,调用 Jira 的 API 打开工单),并通过 iPaaS 部署这些工作流。
While iPaaS can run integration workflows on the cloud, our proposal places and runs compute-intensive jobs on the most suitable cloud based on price, performance, and availability. Similar to middleware, iPaaS is complementary as we can leverage these adaptors to expound the compatibility set.
虽然 iPaaS 可以在云上运行集成工作流,但我们的提案是基于价格、性能和可用性,将计算密集型任务放置在最合适的云上并运行。与中间件类似,iPaaS 是互补的,因为我们可以利用这些适配器来扩展兼容性集。
跨地域分布式分析的优化。有一系列的工作优化了跨地域分布式分析的性能 [64,77,86]。这种场景在本质上与我们的设想相似:它考虑在多个站点上运行 MapReduce 风格的作业(分析查询),而我们则考虑在多个云之间运行粗粒度计算的有向无环图(DAG)。
Optimization for geo-distributed analytics. A line of work has optimized the performance of geo-distributed analytics [64,77,86]. This setting is similar in spirit to ours: it considers running a MapReduce-style job (an analytics query) across many sites, while we consider running a DAG of coarse-grained computations potentially across several clouds.
There are three main differences. First, these techniques are system-specific optimizations, and we in general do not assume as much knowledge about the application. Second, these techniques mostly assume different sites to differ only in their WAN bandwidths and otherwise have identical hardware, while we exploit the inherent differences in hardware, software, pricing, and resource availability of several clouds or regions/zones within a cloud. Third, these solutions optimize for faster completion times, while we also consider minimizing costs and improving resource availability.
这里有三个主要的区别。首先,这些技术是针对特定系统的优化,而我们通常不会假设对应用程序有如此多的了解。其次,这些技术大多假设不同站点的区别仅在于广域网带宽,而硬件是相同的,而我们利用了不同云或同一云内区域/可用区的硬件、软件、价格和资源可用性的固有差异。第三,这些解决方案的目标是优化更快的完成时间,而我们还考虑了成本最小化和资源可用性的提升。
That said, we note that the intercloud broker could potentially leverage system-specific optimizations if it is told that the application is of a certain type (e.g., MapReduce).
尽管如此,我们也指出,如果告知应用程序属于某种特定类型(例如 MapReduce),跨云代理也可能利用这些特定系统的优化。