SkyPilot Serve Model¶

This page focus on high-level architecture and design of SkyPilot Serve.

alt text

Why Do we Suppose SkyServe?¶

SkyServe takes an existing serving framework and deploys it across one or more regions or clouds.

Idea

Each service gets an endpoint that automatically distributes requests to its replicas.
Replicas of the same service can run in different regions and clouds — reducing cloud costs and increasing availability.
SkyServe handles the load balancing, recovery, and autoscaling of the replicas.

SkyServe has a centralized controller VM (who provide it? see below) that manages the deployment of your service.
Each service will have a process group to manage its replicas and route traffic to them.
Controller: will monitor the status of the replicas and re-launch a new replica ==if one of them fails. It also ==autoscales the number of replicas if autoscaling config is set.
Load Balancer: will route the traffic to all ready replicas. It is a lightweight HTTP server that listens on the service endpoint and distribute the requests to one of the replicas.
All of the process group shares a single Controller VM.
Controller VM will be launched in the cloud with the best price/performance ratio. You can also customize.

My Understanding

Controller is like a 👨‍💻 worker, who is responsible for the specific deployment/operation.

Load Balancer is like a 🧠 "Second Brain", it gets info from the controller and tells controller how to deal with replicas.