跳转至

SkyServe Usage

Autoscaling

SkyServe provides out-of-the-box (开箱即用) autoscaling for your services

YAML
1
2
3
4
5
6
7
service:
  readiness_probe: /
  replica_policy:
    min_replicas: 2
    max_replicas: 10
    target_qps_per_replica: 2.5
# ...
  • Initially, launch 2 replicas of your service (min_replicas)
  • Scale up gradually if the traffic is high, up to 10 replicas (max_replicas)
  • Scale down gradually if the traffic is low, with a minimum of 2 replicas (min_replicas)
  • The replica count will always be in the range [min_replicas, max_replicas].

A Standard of "high" and "low" is defined as follows:

  1. It is performed based on the QPS (Queries Per Second) of your service.

  2. SkyServe will scale your service so that, ultimately, each replica receives approximately target_qps_per_replica queries per second. This value can be a float; for example:

    • If the QPS is higher than 2.5 per replica, we recognize it as "high", and SkyServe will launch more replicas (but no more than 10 replicas)
    • If the QPS is lower than 2.5 per replica, we recognize it as "low", SkyServe will scale down the replicas (but no less than 2 replicas)

Specifically, the current target number of replicas is calculated as:

Bash
1
2
current_target_replicas = ceil(current_qps / target_qps_per_replica)
final_target_replicas = min(max_replicas, max(min_replicas, current_target_replicas))

Scaling Delay

  1. SkyServe will not scale up or down immediately. Instead, SkyServe will only upscale or downscale your service if the QPS of your service is higher or lower than the target QPS for a period of time.

  2. This is to avoid scaling up and down too aggressively.

  3. Users can customize the scaling delay in xxx.yaml.

Scale-to-Zero

If your service might experience long periods of time with no traffic, consider using min_replicas: 0:

Updating a Service

  1. SkyServe supports updating a deployed service.
  2. During an update, the service will remain accessible with no downtime and its endpoint will remain the same.
  3. By default, rolling update is applied, while you can also specify a blue-green update.

Rolling Update

details

Bash
1
sky serve update

Process:

  • An update is initiated, and traffic will continue to be redirected to existing (old) replicas.
  • New replicas (with new settings) are brought up in the background.
  • Whenever the total number of old and new replicas exceeds the expected number of replicas (based on autoscaler’s decision), extra old replicas will be scaled down.
  • Traffic will be redirected to both old and new replicas until all new replicas are ready.

I think this example vividly describes this process :)

Blue-Green Update

Bash
1
sky serve update --mode blue_green service-name new_service.yaml
  • An update is initiated, and traffic will continue to be redirected to existing (old) replicas.

  • New replicas (with new settings) are brought up in the background.

  • Traffic will be redirected to new replicas only when all new replicas are ready.

  • Old replicas are scaled down after all new replicas are ready.

During an update, traffic is entirely serviced by either old-versioned or new-versioned replicas.

Example here :)

Authorization

SkyServe provides robust authorization capabilities at the replica level, allowing you to control access to service endpoints with API keys.

I don't think it's commonly used in real development, so I just ignore the detailed information here.

If you wanna get the whole view, please jump to this tutorial