跳转至

K8S Implementation of SkyPilot

We introduce to use k8s to implement SkyPilot based on local laptop "cluster".

Why choose k8s

Frankly speaking, implemeting SkyPilot based on k8s is not the best choice.

You can choose GCP / AWS / Azure / etc, for they have a powerful functionality combined with corresponding GPU.

But! They are toooooooooooo expensive for 99% of us.

So I would like to use k8s instead, even though using k8s cannot provide GPU service :(

You can access full start guide here

Step Pipeline

To prepare a Kubernetes cluster to run SkyPilot, the cluster administrator must:

  1. Deploy a cluster running Kubernetes v1.20 or later.
  2. Set up GPU support.
  3. [Optional] Set up ports for exposing services.
  4. [Optional] Set up permissions: create a namespace for your users and/or create a service account with minimal permissions for SkyPilot.

After these steps, the administrator can share the kubeconfig file with users, who can then submit tasks to the cluster using SkyPilot.

Step 1 - Deploy a Kubernetes Cluster

You can refer to this part on website

After all setup done, you can create your own node:

Bash
1
sky local up # Create a 1-node Kubernetes cluster locally

reference

Under the hood, sky local up uses kind, a tool for creating a Kubernetes cluster on your local machine.

Just for Fun

kind does not support multiple nodes and GPUs.

It is not recommended for use in a production environment.

So using k8s to implement SkyPilot based on local laptop "cluster" is not the best choice for real work.

Step 2 - Set up GPU Support

We skip this :(

The reason for skipping is that I use Apple Silicon, which does not support nvidia GPU virtualization.

Step 3 - Verifying Setup

Check Status

Once the cluster is deployed and you have placed your kubeconfig at ~/.kube/config, verify your setup by running sky check:

Bash
1
sky check kubernetes

This should show Kubernetes: Enabled without any warnings.

Bash
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
 sky check kubernetes
Checking credentials to enable clouds for SkyPilot.
  Kubernetes: enabled
    Hint: Could not detect GPU resources (`nvidia.com/gpu`) in Kubernetes cluster. If this cluster contains GPUs, please ensure GPU drivers are installed on the node. Check if the GPUs are setup correctly by running `kubectl describe nodes` and looking for the nvidia.com/gpu resource. Please refer to the documentation on how to set up GPUs.

To enable a cloud, follow the hints above and rerun: sky check
If any problems remain, refer to detailed docs at: https://skypilot.readthedocs.io/en/latest/getting-started/installation.html

🎉 Enabled clouds 🎉
   Kubernetes

Check GPU Status

You can also check the GPUs available on your nodes by running:

Bash
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# example output on the official website
sky show-gpus --cloud kubernetes
Kubernetes GPUs
GPU   REQUESTABLE_QTY_PER_NODE  TOTAL_GPUS  TOTAL_FREE_GPUS
L4    1, 2, 4                   12          12
H100  1, 2, 4, 8                16          16

Kubernetes per node GPU availability
NODE_NAME                  GPU_NAME  TOTAL_GPUS  FREE_GPUS
my-cluster-0               L4        4           4
my-cluster-1               L4        4           4
my-cluster-2               L4        2           2
my-cluster-3               L4        2           2
my-cluster-4               H100      8           8
my-cluster-5               H100      8           8

my own output:

Bash
1
2
 sky show-gpus --cloud kubernetes
No GPUs found in Kubernetes cluster. If your cluster contains GPUs, make sure nvidia.com/gpu resource is available on the nodes and the node labels for identifying GPUs (e.g., skypilot.co/accelerator) are setup correctly. To further debug, run: sky check

List SkyPilot resources

Bash
1
2
sky status --k8s # all users
sky status # only the current user

Summary

After all the startup steps, you can use SkyPilot to run your tasks on the Kubernetes cluster.

The pipeline is:

  1. Start local machine docker (as for me, it is docker-desktop)
  2. Start the 1-node Kubernetes cluster locally, which is sky local up
  3. Check cluster status by sky status
  4. Conducting tasks ......
  5. After the task is completed, you can delete the cluster by sky local down

Q & A

You may feel confused about some concepts:

Bash
1
2
3
4
5
6
 kubectl config get-contexts
CURRENT   NAME                                               CLUSTER                                            AUTHINFO                                           NAMESPACE
          arn:aws:eks:us-east-1:034901485801:cluster/MyEKS   arn:aws:eks:us-east-1:034901485801:cluster/MyEKS   arn:aws:eks:us-east-1:034901485801:cluster/MyEKS
          docker-desktop                                     docker-desktop                                     docker-desktop                                     default
          kind-kind                                          kind-kind                                          kind-kind
*         kind-skypilot                                      kind-skypilot                                      kind-skypilot

There are two "helper" related to skypilot, one is kind, and the other is docker-desktop.

The kind is a function to "virtually make" a local Kubernetes cluster, and the docker-desktop is the local Docker container.

KIND

kind is a tool for running local Kubernetes clusters using Docker container “nodes”.

kind was primarily designed for testing Kubernetes itself, but may be used for local development or CI.

  1. The skypilot is application-level, and is based on the Kubernetes cluster.
  2. The cluster need to be controlled by container, which is docker-desktop.

Hence, You can understand the relationship between skypilot, kind, and docker-desktop:

Bash
1
skypiloy <--- by kind --- cluster <------ docker-desktop

Therefore, if you close the docker-desktop, the cluster will be closed, and the skypilot will not be able to run.

So my advice is: make sure the docker-desktop is running before you use skypilot, better to be set into auto-start.