SkyServe Usage¶
Autoscaling¶
SkyServe provides out-of-the-box (开箱即用) autoscaling for your services
YAML | |
---|---|
1 2 3 4 5 6 7 |
|
- Initially, launch 2 replicas of your service (min_replicas)
- Scale up gradually if the traffic is high, up to 10 replicas (max_replicas)
- Scale down gradually if the traffic is low, with a minimum of 2 replicas (min_replicas)
- The replica count will always be in the range [min_replicas, max_replicas].
A Standard of "high" and "low" is defined as follows:
-
It is performed based on the QPS (Queries Per Second) of your service.
-
SkyServe will scale your service so that, ultimately, each replica receives approximately
target_qps_per_replica
queries per second. This value can be a float; for example:- If the
QPS
is higher than 2.5 per replica, we recognize it as "high", and SkyServe will launch more replicas (but no more than 10 replicas) - If the
QPS
is lower than 2.5 per replica, we recognize it as "low", SkyServe will scale down the replicas (but no less than 2 replicas)
- If the
Specifically, the current target number of replicas is calculated as:
Bash | |
---|---|
1 2 |
|
Scaling Delay
-
SkyServe will not scale up or down immediately. Instead, SkyServe will only upscale or downscale your service if the QPS of your service is higher or lower than the target QPS for a period of time.
-
This is to avoid scaling up and down too aggressively.
-
Users can customize the scaling delay in
xxx.yaml
.
Scale-to-Zero
If your service might experience long periods of time with no traffic, consider using min_replicas: 0
:
Updating a Service¶
- SkyServe supports updating a deployed service.
- During an update, the service will remain accessible with no downtime and its endpoint will remain the same.
- By default,
rolling update
is applied, while you can also specify ablue-green update
.
Rolling Update
Bash | |
---|---|
1 |
|
Process:
- An update is initiated, and traffic will continue to be redirected to existing (old) replicas.
- New replicas (with new settings) are brought up in the background.
- Whenever the total number of old and new replicas exceeds the expected number of replicas (based on autoscaler’s decision), extra old replicas will be scaled down.
- Traffic will be redirected to both old and new replicas until all new replicas are ready.
I think this example vividly describes this process :)
Blue-Green Update
Bash | |
---|---|
1 |
|
-
An update is initiated, and traffic will continue to be redirected to existing (old) replicas.
-
New replicas (with new settings) are brought up in the background.
-
Traffic will be redirected to new replicas only when all new replicas are ready.
-
Old replicas are scaled down after all new replicas are ready.
During an update, traffic is entirely serviced by either old-versioned or new-versioned replicas.
Example here :)
Authorization¶
SkyServe provides robust authorization capabilities at the replica level, allowing you to control access to service endpoints with API keys.
I don't think it's commonly used in real development, so I just ignore the detailed information here.
If you wanna get the whole view, please jump to this tutorial