SkyPilot Serve Model¶
This page focus on high-level architecture and design of SkyPilot Serve.
Why Do we Suppose SkyServe?¶
SkyServe takes an existing serving framework and deploys it across one or more regions or clouds.
Idea
-
Each service gets an endpoint that automatically distributes requests to its replicas.
-
Replicas of the same service can run in different regions and clouds — reducing cloud costs and increasing availability.
-
SkyServe handles the load balancing, recovery, and autoscaling of the replicas.
Architecture¶
Components¶
- Controller VM: manages all
- Load Balancer: listen to service endpoint and route the traffic
- Service Controller: monitor the status / re-launch if error / autoscale
- Service Endpoint: connection between 👨 and 🧠
- Replicas: the actual service load
Details¶
- SkyServe has a centralized controller VM (who provide it? see below) that manages the deployment of your service.
- Each service will have a process group to manage its replicas and route traffic to them.
- Controller: will monitor the status of the replicas and re-launch a new replica ==if one of them fails. It also ==autoscales the number of replicas if autoscaling config is set.
- Load Balancer: will route the traffic to all ready replicas. It is a lightweight HTTP server that listens on the service endpoint and distribute the requests to one of the replicas.
- All of the process group shares a single Controller VM.
- Controller VM will be launched in the cloud with the best price/performance ratio. You can also customize.
My Understanding
Controller is like a 👨💻 worker, who is responsible for the specific deployment/operation.
Load Balancer is like a 🧠 "Second Brain", it gets info from the controller and tells controller how to deal with replicas.