
SkyPilot Serve Model

This page focus on high-level architecture and design of SkyPilot Serve.

alt text

Why Do we Suppose SkyServe?

SkyServe takes an existing serving framework and deploys it across one or more regions or clouds.


  • Each service gets an endpoint that automatically distributes requests to its replicas.

  • Replicas of the same service can run in different regions and clouds — reducing cloud costs and increasing availability.

  • SkyServe handles the load balancing, recovery, and autoscaling of the replicas.



  • Controller VM: manages all
  • Load Balancer: listen to service endpoint and route the traffic
  • Service Controller: monitor the status / re-launch if error / autoscale
  • Service Endpoint: connection between 👨 and 🧠
  • Replicas: the actual service load


  1. SkyServe has a centralized controller VM (who provide it? see below) that manages the deployment of your service.
  2. Each service will have a process group to manage its replicas and route traffic to them.
  3. Controller: will monitor the status of the replicas and re-launch a new replica ==if one of them fails. It also ==autoscales the number of replicas if autoscaling config is set.
  4. Load Balancer: will route the traffic to all ready replicas. It is a lightweight HTTP server that listens on the service endpoint and distribute the requests to one of the replicas.
  5. All of the process group shares a single Controller VM.
  6. Controller VM will be launched in the cloud with the best price/performance ratio. You can also customize.
My Understanding

Controller is like a 👨‍💻 worker, who is responsible for the specific deployment/operation.

Load Balancer is like a 🧠 "Second Brain", it gets info from the controller and tells controller how to deal with replicas.