Supporting Hybrid Virtualization Orchestration for Edge Computing¶

Danger

这篇文章需要对分布式系统、操作系统领域的研究有相当深厚的研究背景

笔者暂时还不具备，因此用ai辅助阅读，仅用于拓宽视野

Microservice architectures allow developers to decompose their applications into independently deployable functional blocks, each with its own requirements. In order to support a wide range of constraints, service virtualization can be customized across microservices but is typically homogeneous within a cluster. As there is no clear one size fit all approach, we can improve resource utilization and performance by using virtualization as a new dimension in orchestration, especially in edge computing environments. For instance, Unikernels represent a lightweight virtualization technology that offers a performant alternative to traditional containers. While we find different studies analyzing and comparing these virtualization technologies, (a) the performance results might vary when including the overhead of the orchestration platform, and (b) it’s not trivial to select the perfect virtualization technology for an entire cluster. In this paper, we explore the benefits of hybrid container-unikernel deployments by extending an orchestration framework for edge computing to allow for seamless mixing and matching of both technologies. Our evaluation shows how hybrid deployments can lead up to 44% CPU reduction cluster-wide while there are scenarios where containers are still preferable.

In the landscape of modern computing, microservice architectures have increasingly become the standard approach for designing highly scalable and available applications. Microservices are typically deployed in containers [13, 28, 54]. With the OCI standardization [7], containers are nowadays decoupled from the underlying runtime, allowing for seamless portability across different environments and reusability of common functions – like nginx web servers, Redis, etc. This calls for a new dimension in orchestration, where the virtualization technology can be chosen based on the service requirements. Unfortunately, state-of-the-art still considers virtualization coupled with infrastructure provisioning. Edge computing significantly alters this assumption, given its inherently heterogeneous infrastructure offering variations in (CPU/memory) hardware, OS support, etc. [39, 45]. Previous studies highlight the operational overheads caused by cloud-native assumptions at the edge [15, 41] and benefits of lightweight virtualization in conjunction with containers [21, 29]. Unikernels are a good candidate for the edge because of their small footprint, faster instantiation, improved performance, and flexibility [35, 53]. However, despite advancements in unikernel toolchains, such as Unikraft [31], which allows porting existing Linux applications, the ecosystem does not support a wide range of applications and driver functionality [25]. Moreover, as shown in the remainder of this paper, unikernels are not the best choice for all services at all times. We envision a future where, given a standardized packaging format like OCI, the runtime can be chosen dynamically based on application requirements, with the orchestration platform effectively becoming a middleware for multi-virtualization setups. Take, for example, a stream-processing video analytics pipeline, which can include several GPU-intensive services that operate more suitably as containers with full-fledged OS providing complex driver support [55]. However, services within the pipeline, such as load balancers, may be more performant as unikernels using a hypervisor as resource multiplexer [29, 33, 36]. Hybrid virtualization also enables a gradual transition of complex containerized applications to unikernels – as the build toolchain evolves to support more system calls and libraries [1, 6, 23]. While several papers have empirically evaluated and compared the performance of different isolation technologies [29, 33, 49], they do not consider (i) the overheads of compatibility layers which allow these virtualizations to operate on common hardware and (ii) orchestration overhead for managing deployments with different virtualizations at runtime.

This paper explores the feasibility of container-unikernel hybrid orchestration. Our contributions are as follows.

(1) We extend Oakestra [15], a lightweight orchestration framework for edge computing. We implement a compatibility layer that allows Unikraft [31] unikernels to behave as containers from an orchestration perspective. We extend the control plane to aggregate clusters’ virtualization information and the scheduling workflows to consider virtualization requirements. We introduce service hot-swap to change the service’s virtualization technology at runtime.

(2) We evaluate the suitability of hybrid virtualization orchestration via real-world application pipelines. Specifically, we dissect the overhead of the compatibility layer performing cross-deployment of containers via runc, unikernels, and gVisor secure containers. Our results showcase the potential for hybrid virtualization, achieving up to ≈ 44% CPU usage reduction in our cluster.

1. 问题背景：单一虚拟化技术的局限性

需求与现实的矛盾：微服务架构中，每个服务模块的需求各不相同，但在一个集群内通常只能使用同一种虚拟化技术（如同质化），这导致资源利用率和性能无法达到最优
边缘计算的挑战：在硬件资源受限且异构的边缘计算环境中，这种“一刀切”的方案问题尤为突出
现有研究的不足：以往对不同虚拟化技术的性能比较，往往忽略了编排平台本身带来的性能开销，结论不够贴近真实部署场景

2. 核心思想：将“虚拟化”作为编排的新维度

打破常规：文章提出，不应将虚拟化技术（如容器或Unikernel）看作是基础设施的静态约束，而应将其提升为一个动态的、可优化的编排维度
混合部署方案：核心是实现一个可以无缝混合、匹配容器和Unikernel的部署模式。这样可以根据每个微服务的具体需求，为其选择最合适的“运行外壳”
按需选择：例如， 需要复杂驱动支持的GPU密集型服务 使用容器，而 轻量级的负载均衡器等服务 则使用性能更高的Unikernel

3. 本文贡献：从理论到实践的验证

技术实现：作者扩展了一个名为 Oakestra 的边缘计算编排框架，使其能够支持这种混合部署模式
关键功能：
- 为Unikernel开发了兼容层，使其能像容器一样被统一管理
- 扩展了框架的调度与控制能力，使其能识别并处理不同虚拟化需求
- 引入了服务热切换功能，允许在运行时动态改变服务的虚拟化类型

4. 主要成果：混合部署能显著节约资源

量化结果：通过在真实应用场景中的评估，证明了混合部署的巨大优势，在整个集群范围内最高可实现 44% 的 CPU 使用率降低
最终结论：不存在一种“万能”的虚拟化技术，容器和Unikernel各有其最适用的场景。因此，能够灵活选择和混合部署的编排能力至关重要

Due to their small memory footprint (approx. a few MB) and reduced system call dependencies [32], unikernels offer faster boot times (≈ 10×) compared to containers [26, 31] and are easy to scale and migrate [37, 40, 51]. The shared application and kernel address space allows all code to run in the same CPU privilege domain, which improves performance by avoiding application and kernel context switches [52]. Early unikernel frameworks, such as MirageOS [4], required developers to write applications from scratch. However, in recent years, unikernels have evolved as a capable alternative to containers. Unikraft [31] provides a streamlined approach to building and porting existing Linux applications. The toolchain provides a high degree of POSIX compatibility (≈ 160+ out of 224 syscalls required for popular Linux install [34]). EVE-OS [1], a Linux Foundation project, is a universal, vendor-agnostic OS for edge computing hardware (including embedded devices) and adds native support for both containerized and unikernels workloads. Container runtimes (e.g., Firecracker [2], gVisor [3]) enhance container security and isolation by executing them as para-virtualized microVMs over qemu/kvm similar to unikernels. The OCI standards help supporting both container and unikernel runtimes simultaneously [7, 11]. Experimental runtimes like urunc [42] and runu [46] represent the first steps towards kernel-level compatibility unifying containers and unikernels [47]. Unfortunately, it is not clear what the overhead of such compatibility layers is in real-world deployments, and how they affect the orchestration of services, especially on constrained hardware at the edge. Edge infrastructure is generally less powerful and more heterogeneous than cloud datacenters, often comprising of smaller devices with varying CPU architectures and capabilities, e.g., Intel NUCs, Jetson Xavier, Raspberry Pis, etc. As edge computing is often seen as an extension of the cloud, the majority of orchestration solutions adapt the popular cloud-native Kubernetes (K8s) framework [19]. Solutions like KubeEdge [20], KubeFed [24], and MicroK8s [17] modify Kubernetes by simplifying control-plane operations and removing non-essential components to make it applicable for edge. On the other hand, Oakestra [12, 15] rearchitects the orchestration control plane from the ground up to address the hardware heterogeneity and geographical diversity in edge infrastructures with minimal overhead. In Oakestra, computational devices (leaf nodes) are grouped into (logical) clusters managed by local cluster orchestrators (see fig. 1). The worker node includes NodeEngine component for managing service deployment and operation and NetManager for network communication. Each cluster orchestrator is responsible for keeping track of fine-grained resource and service management within its cluster. The root orchestrator acts as an “orchestrator of clusters” and the point-of-contact of developers to deploy their applications.

Unfortunately, all state-of-the-art orchestration frameworks treat virtualization as a cluster constraint. We finally have an opportunity to exploit virtualization as a dimension to improve resource utilization and application performance. In [38], the authors examine approaches for orchestrating sandboxed containers as microVMs over qemu/kvm via extensions to K8s. FADES [21] leverages MirageOS [4] unikernels to deploy application microservices in Xenbootable images suitable for edge devices. However, arguably (i) not all applications perform better as unikernels, and (ii) the virtualization technology must be dictated by application requirements and not by the infrastructure availability alone.

1. Unikernel 技术：一种轻量级高性能的容器替代方案

核心优势：与传统容器相比，Unikernel 拥有显著优势，包括：
- 内存占用极小（仅几 MB）
- 启动速度极快（约快10倍）
- 性能更高：通过共享应用与内核地址空间，避免了上下文切换开销
- 易于扩展和迁移
技术演进：Unikernel 已从需要从零开发应用的早期阶段（如 MirageOS），发展到能够通过现代化工具链（如 Unikraft）轻松移植现有 Linux 应用的阶段，并具备了高度的系统调用兼容性

2. 边缘计算编排：挑战与现有方案

核心挑战：边缘基础设施与云数据中心不同，其硬件性能更弱、异构性更强（如不同的 CPU 架构和能力）
两种主流方案：
- 改良云原生框架：大多数方案（如 KubeEdge, MicroK8s）通过简化和修改 Kubernetes (K8s) 来适应边缘环境
- 原生边缘框架：另一类方案（如文中的 Oakestra）则从头开始设计，旨在以最小开销原生解决边缘的异构性和地理多样性问题

3. 生态融合趋势：容器与 Unikernel 边界模糊化

安全容器：像 Firecracker 和 gVisor 这样的运行时，通过将容器放入类似 Unikernel 的轻量级虚拟机 (microVMs) 中运行，来增强容器的安全性和隔离性
标准化与兼容性：OCI 等开放标准正在推动容器和 Unikernel 运行时的共存与统一。实验性的运行时（如 urunc, runu）甚至在探索内核层面的兼容，但这会带来性能开销问题，尤其在受限的边缘硬件上

4. 核心论点：将虚拟化作为动态编排的新维度

当前局限：所有先进的编排框架都将“虚拟化技术”（例如，是使用容器还是 Unikernel）视为一个静态的、集群级别的约束
研究动机：作者认为这是一种浪费，并提出核心论点：
- 并非所有应用都最适合 Unikernel
- 虚拟化技术的选择应该由单个应用的需求来决定，而不是由基础设施的限制来决定
- 因此，需要一种能够混合编排不同虚拟化技术的框架，将虚拟化本身当作一个可以提升资源利用率和性能的动态优化维度

Orchestration Support¶

In our exploration, we extend the Oakestra [12, 15] orchestration framework to support hybrid container-unikernel deployments and measure the overheads and benefits of such hybrid virtualization setups. We choose Oakestra due to its lightweight implementation and extensible design, which allows us to integrate unikernels orchestration metrics alongside containers with minimal changes and reduced overhead. We use Unikraft [31] as the unikernel runtime for our experiments, as it provides a wide range of unikernel configurations and supports a variety of applications. The proposed architecture provides an extensible interface that is used to evaluate unikernel virtualization as an additional orchestration dimension, but that easily allows for further runtimes support and optimal virtualization selection.

3.1 Hybrid Service Deployment¶

To enable hybrid virtualization support, we must ensure that the worker nodes’ hardware can support container and unikernel execution. Oakestra supports integration of new runtimes via (a) using the runtime dispatcher interface or (b) integration to containerd thanks to OCI runtime-spec compatibility. Initial experiments with runu [46], an OCI-compatible runtime for containerd, showed inconsistent behavior. This runtime is currently under development [47], so misalignments with the latest Unikraft versions are expected. Moreover, managing runu as OCI runtime involves the additional overhead of the containerd middleware managing the hypervisor, which can be avoided by directly interacting with qemu. To overcome this, we design and implement a Unikernel Runtime Abstractor component in Oakestra’s NodeEngine, which instead of controlling unikernels via containerd, adds abstractions for directly managing and monitoring Unikraft services within the orchestration framework (see fig. 2). This approach, while Oakestraspecific, is not replacing the OCI runtimes such as runu/urunc. These runtimes can be easily integrated as containerd runtimes when they mature, but at the cost of additional overhead. Unikernels (and the abstractor) are only enabled on machines supporting qemu and kvm targets (e.g., node 1 and 2 in fig. 1) while containers are enabled only in nodes supporting containerd. The Unikernel Runtime Abstractor (i) manages the lifecycle of unikernels and interacts directly with qemu for the virtualization, (ii) performs the downloading and unpacking of kernel images, and (iii) binds a routine to read the qemu qmp socket interface and update the internal service status (running/paused/failed) to the cluster orchestrator.

3.2 Resource Management and Scheduling¶

As shown in fig. 1, typical edge deployment with Oakestra may include worker nodes with heterogeneous hardware architecture and runtime target support. While some nodes execute both container and unikernel deployments (e.g., node 1), others only support either one of the two (e.g., node 2 or 3). At worker startup, the NodeEngine checks the nodes’ CPU architecture and virtualization support. Then, it advertises its runtimes to the associated cluster orchestrator as virt tuple (). In Oakestra, the cluster orchestrator hides the internal infrastructure details of its workers by only sending the aggregated cluster statistics to the root orchestrator [14]. We extend the control plane to support multi-virtualization by propagating the set of all available virtualization and hardware architecture combinations in each cluster (see Cluster1 → Root in fig. 1). Further, we extend the Oakestra schedulers to consider virtualization during scheduling. Specifically, the service schedulers, before they apply one of the available scheduling policies [15], they first prune the most suitable cluster(s)/worker(s) list based on the available virtualization options matched with SLAs requirements.

3.3 Virtualization Hot-Swapping¶

We extend the control plane with a runtime switch functionality for stateless applications supporting multiple virtualization technologies. Suppose service1.instance1 is deployed as a container but a unikernel implementation is available. By triggering the hot-swap, the control plane performs the deployment of service1.instance2 unikernel alongside the first instance. The network component gradually balances and shifts the traffic from the container to the unikernel instance. Once the traffic migration is complete the first containerized instance is removed.

3.4 Inter-Service Networking¶

To achieve agile hybrid virtualization, it is important that the orchestrated services can interact with both unikernel and containerbased services without additional overhead. Oakestra utilizes a semantic overlay network to enable multi-cluster container networking and load balancing. Each service is allocated IP addresses mapped to different load-balancing strategies across available instances. The NetManager interprets packets to/from a semantic address and re-assigns them to the correct instance IP address – forming a tunnel between communicating services. To achieve similar seamless networking between containers and unikernels (and across unikernels), we extend the NetManager to provision (i) a network namespace for unikernels and (ii) a local namespace IP address that can be used to translate network packets irrespective of the virtualization target. Unlike containers, unikernels do not share the host kernel but require a dedicated network stack. We overcome this by connecting a macvtap interface in bridge mode directly to the veth of the service’s network namespace (see ). The runtime abstractor is providing such interface to qemu for the unikernel startup. With this extension, unikernel and container namespace IP addresses are provisioned using the same Oakestra mechanism, giving out-of-the-box support for semantic load balancing and service discovery for both supported virtualization techniques.

太难了, ai四句话概括:

alt text

为了实现容器（Container）与单核（Unikernel）的混合编排，作者对 Oakestra 框架进行了四大核心技术扩展

部署实现：开发了一个 “运行时抽象器”，绕过标准但尚不稳定的方案，直接通过 QEMU 来高效、稳定地管理 Unikernel 的生命周期
调度感知：改造了资源管理与调度系统，让调度器能够智能识别每个工作节点支持的虚拟化类型（容器、Unikernel或两者都支持），从而将服务精确部署到合适的节点上
网络互通：扩展了服务间网络机制，通过 macvtap 等技术巧妙解决了 Unikernel 独立的网络栈问题，确保容器与 Unikernel 之间可以无缝通信、负载均衡和相互发现
动态切换：增加了一项 “虚拟化热切换” 功能，允许无状态服务在运行时不停机地从容器形态切换到 Unikernel 形态，反之亦可。

Application Performance¶

tldr

Discussion and Future Work¶

While advancements in toolchains like Kraftkit [48] are enhancing the portability of applications to the unikernel domain, not all libraries, drivers, and consequently applications are currently supported. Moreover, as we show in §4, determining the most suitable virtualization for a given application is not straightforward. Generally, we observed that within an orchestrated infrastructure, containers are a more efficient choice for network-dominant and latency-critical applications, while unikernels are better suited for CPU-intensive applications and scalability, achieving up to 44% CPU usage reduction in our cluster. It is crucial to recognize that the real-world performance of unikernels might not always align with conceptual expectations [22] and aspects such as flexibility, compatibility and security may be more relevant factors for considering optimal virtualization choice [44]. Our findings motivate for joint transparent orchestration of unikernels and containers as well as the need for a platform that abstracts its complexity. Summing up, there is no one-size-fits-all approach for service virtualization.

In future extensions, we envision a closer integration between Oakestra, Unikraft, and qemu to reduce the virtual network bottlenecks experienced with the unikernels. We also plan to investigate intelligent scheduling solutions for performance forecasting and a telemetry-based feedback loop for performance monitoring across virtualizations from the application to the runtime layer. Moreover, we plan to integrate cross-virtualization checkpointing to enable the data migration of stateful applications with minimal loss and downtime across different virtualization technologies.

尽管像 Kraftkit [48] 这样的工具链的进步正在增强应用向 Unikernel 领域的可移植性，但目前并非所有的库、驱动程序以及应用都得到了支持。此外，正如我们在第4节中展示的，为给定应用确定最合适的虚拟化技术并非易事。

总的来说，我们观察到在一个被编排的基础设施中:

对于网络主导和延迟关键型应用，容器是更高效的选择
对于 CPU 密集型应用和可扩展性场景，Unikernel 更为适合，在我们的集群中实现了高达 44% 的 CPU 使用率降低

认识到 Unikernel 在真实世界中的性能可能并不总是与理论预期相符是至关重要的 [22]，而且灵活性、兼容性和安全性等因素可能是选择最佳虚拟化方案时更相关的考量因素 [44]。我们的发现推动了对 Unikernel 和容器进行联合透明编排的需求，以及对一个能够抽象其复杂性的平台的需求。总而言之，服务虚拟化不存在一种“放之四海而皆准”的方法。

在未来的扩展中，我们设想将 Oakestra、Unikraft 和 QEMU 进行更紧密的集成，以减少 Unikernel 所遇到的虚拟网络瓶颈。

我们还计划研究用于性能预测的智能调度解决方案，以及一个基于遥测的回环反馈机制，用于在从应用到运行时的各个层面监控跨虚拟化的性能。此外，我们计划集成跨虚拟化的检查点机制，以实现在不同虚拟化技术之间迁移有状态应用的数据时，将数据丢失和停机时间降至最低。