(serve-in-production-kubernetes)=

# Deploy on Kubernetes

This section should help you:

- understand how to install and use the [KubeRay] operator.
- understand how to deploy a Ray Serve application using a [RayService].
- understand how to monitor and update your application.

The recommended way to deploy Ray Serve is on Kubernetes, providing the best of both worlds: the user experience and scalable compute of Ray Serve and operational benefits of Kubernetes.
This also allows you to integrate with existing applications that may be running on Kubernetes.
The recommended practice when running on Kubernetes is to use the [RayService] controller that's provided as part of [KubeRay]. The RayService controller automatically handles important production requirements such as health checking, status reporting, failure recovery, and upgrades.

A [RayService] CR encapsulates a multi-node Ray Cluster and a Serve application that runs on top of it into a single Kubernetes manifest.
Deploying, upgrading, and getting the status of the application can be done using standard `kubectl` commands.
This section walks through how to deploy, monitor, and upgrade the [`FruitStand` example](serve-in-production-example) on Kubernetes.

:::{warning}
Although it's actively developed and maintained, [KubeRay] is still considered alpha, or experimental, so some APIs may be subject to change.
:::

(serve-installing-kuberay-operator)=
## Installing the KubeRay operator

Follow the [KubeRay quickstart guide](kuberay-quickstart) to:
* Install `kubectl` and `Helm`
* Prepare a Kubernetes cluster
* Deploy a KubeRay operator

(serve-deploy-app-on-kuberay)=
## Deploying a Serve application

Once the KubeRay controller is running, you can manage your Ray Serve application by creating and updating a `RayService` custom resource (CR).
`RayService` custom resources consist of the following:
- a `KubeRay` `RayCluster` config defining the cluster that the Serve application runs on.
- a Ray Serve [config](serve-in-production-config-file) defining the Serve application to run on the cluster.

:::{tip}
You can use the `--kubernetes-format`/`-k` flag with `serve build` to print the Serve config in a format that can be copy-pasted directly into your [Kubernetes config](serve-in-production-kubernetes). You can paste this config into the `RayService` CR.
:::

When the `RayService` is created, the `KubeRay` controller first creates a Ray cluster using the provided configuration.
Then, once the cluster is running, it deploys the Serve application to the cluster using the [REST API](serve-in-production-deploying).
The controller also creates a Kubernetes Service that can be used to route traffic to the Serve application.

Let's see this in action by deploying the [`FruitStand` example](serve-in-production-example).
The Serve config for the example is embedded into [this example `RayService` CR](https://github.com/ray-project/kuberay/blob/release-0.5/ray-operator/config/samples/ray_v1alpha1_rayservice.yaml).
To follow along, save this CR locally in a file named `ray_v1alpha1_rayservice.yaml`:

:::{note}
The example `RayService` uses very small resource requests because it's only for demonstration.
In production, you'll want to provide more resources to the cluster.
Learn more about how to configure KubeRay clusters [here](kuberay-config).
:::

```console
$ curl -o ray_v1alpha1_rayservice.yaml https://raw.githubusercontent.com/ray-project/kuberay/release-0.5/ray-operator/config/samples/ray_v1alpha1_rayservice.yaml
```

To deploy the example, we simply `kubectl apply` the CR.
This creates the underlying Ray cluster, consisting of a head and worker node pod (see [Ray Clusters Key Concepts](../../cluster/key-concepts.rst) for more details on Ray clusters), as well as the service that can be used to query our application:

```console
$ kubectl apply -f ray_v1alpha1_rayservice.yaml

$ kubectl get rayservices
NAME                AGE
rayservice-sample   7s

$ kubectl get pods
NAME                                                      READY   STATUS    RESTARTS   AGE
ervice-sample-raycluster-454c4-worker-small-group-b6mmg   1/1     Running   0          XXs
kuberay-operator-7fbdbf8c89-4lrnr                         1/1     Running   0          XXs
rayservice-sample-raycluster-454c4-head-krk9d             1/1     Running   0          XXs

$ kubectl get services

rayservice-sample-head-svc                         ClusterIP   ...        8080/TCP,6379/TCP,8265/TCP,10001/TCP,8000/TCP,52365/TCP   XXs
rayservice-sample-raycluster-454c4-dashboard-svc   ClusterIP   ...        52365/TCP                                                 XXs
rayservice-sample-raycluster-454c4-head-svc        ClusterIP   ...        8000/TCP,52365/TCP,8080/TCP,6379/TCP,8265/TCP,10001/TCP   XXs
rayservice-sample-serve-svc                        ClusterIP   ...        8000/TCP                                                  XXs
```

Note that the `rayservice-sample-serve-svc` above is the one that can be used to send queries to the Serve application -- this will be used in the next section.

## Querying the application

Once the `RayService` is running, we can query it over HTTP using the service created by the KubeRay controller.
This service can be queried directly from inside the cluster, but to access it from your laptop you'll need to configure a [Kubernetes ingress](kuberay-networking) or use port forwarding as below:

```console
$ kubectl port-forward service/rayservice-sample-serve-svc 8000
$ curl -X POST -H 'Content-Type: application/json' localhost:8000 -d '["MANGO", 2]'
6
```

## Getting the status of the application

As the `RayService` is running, the `KubeRay` controller continually monitors it and writes relevant status updates to the CR.
You can view the status of the application using `kubectl describe`.
This includes the status of the cluster, events such as health check failures or restarts, and the application-level statuses reported by [`serve status`](serve-in-production-inspecting).

```console
$ kubectl get rayservices
NAME                AGE
rayservice-sample   7s

$ kubectl describe rayservice rayservice-sample
...
Status:
  Active Service Status:
    App Status:
      Last Update Time:  2022-08-16T20:52:41Z
      Status:            RUNNING
    Dashboard Status:
      Health Last Update Time:  2022-08-16T20:52:41Z
      Is Healthy:               true
      Last Update Time:         2022-08-16T20:52:41Z
    Ray Cluster Name:           rayservice-sample-raycluster-9ghjw
    Ray Cluster Status:
      Available Worker Replicas:  2
      Desired Worker Replicas:    1
      Endpoints:
        Client:             10001
        Dashboard:          8265
        Dashboard - Agent:  52365
        Gcs - Server:       6379
        Serve:              8000
      Last Update Time:     2022-08-16T20:51:14Z
      Max Worker Replicas:  5
      Min Worker Replicas:  1
      State:                ready
    Serve Deployment Statuses:
      Health Last Update Time:  2022-08-16T20:52:41Z
      Last Update Time:         2022-08-16T20:52:41Z
      Name:                     MangoStand
      Status:                   HEALTHY
      Health Last Update Time:  2022-08-16T20:52:41Z
      Last Update Time:         2022-08-16T20:52:41Z
      Name:                     OrangeStand
      Status:                   HEALTHY
      Health Last Update Time:  2022-08-16T20:52:41Z
      Last Update Time:         2022-08-16T20:52:41Z
      Name:                     PearStand
      Status:                   HEALTHY
      Health Last Update Time:  2022-08-16T20:52:41Z
      Last Update Time:         2022-08-16T20:52:41Z
      Name:                     FruitMarket
      Status:                   HEALTHY
      Health Last Update Time:  2022-08-16T20:52:41Z
      Last Update Time:         2022-08-16T20:52:41Z
      Name:                     DAGDriver
      Status:                   HEALTHY
  Pending Service Status:
    App Status:
    Dashboard Status:
    Ray Cluster Status:
  Service Status:  Running
Events:
  Type    Reason                       Age                     From                   Message
  ----    ------                       ----                    ----                   -------
  Normal  WaitForDashboard             5m44s (x2 over 5m44s)   rayservice-controller  Service "rayservice-sample-raycluster-9ghjw-dashboard-svc" not found
  Normal  WaitForServeDeploymentReady  4m37s (x17 over 5m42s)  rayservice-controller  Put "http://rayservice-sample-raycluster-9ghjw-dashboard-svc.default.svc.cluster.local:52365/api/serve/deployments/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Normal  WaitForServeDeploymentReady  4m35s (x6 over 5m38s)   rayservice-controller  Put "http://rayservice-sample-raycluster-9ghjw-dashboard-svc.default.svc.cluster.local:52365/api/serve/deployments/": dial tcp 10.121.3.243:52365: i/o timeout (Client.Timeout exceeded while awaiting headers)
  Normal  Running                      44s (x129 over 94s)     rayservice-controller  The Serve applicaton is now running and healthy.
```

## Updating the application

To update the `RayService`, modify the manifest and apply it use `kubectl apply`.
There are two types of updates that can occur:
- *Application-level updates*: when only the Serve config options are changed, the update is applied _in-place_ on the same Ray cluster. This enables [lightweight updates](serve-in-production-lightweight-update) such as scaling a deployment up or down or modifying autoscaling parameters.
- *Cluster-level updates*: when the `RayCluster` config options are changed, such as updating the container image for the cluster, it may result in a cluster-level update. In this case, a new cluster is started, and the application is deployed to it. Once the new cluster is ready, the Kubernetes service is updated to point to the new cluster and the previous cluster is terminated. There should not be any downtime for the application, but note that this requires the Kubernetes cluster to be large enough to schedule both Ray clusters.

### Example: Serve config update

In the `FruitStand` example above, let's change the price of a mango in the Serve config to 4:

```console
  - name: MangoStand
    numReplicas: 1
    userConfig: |
      price: 4
```

Now to update the application we apply the modified manifest:

```console
$ kubectl apply -f ray_v1alpha1_rayservice.yaml

$ kubectl describe rayservice rayservice-sample
...
  serveDeploymentStatuses:
  - healthLastUpdateTime: "2022-07-18T21:51:37Z"
    lastUpdateTime: "2022-07-18T21:51:41Z"
    name: MangoStand
    status: UPDATING
...
```

If we query the application, we can see that we now get a different result reflecting the updated price:

```console
$ curl -X POST -H 'Content-Type: application/json' localhost:8000 -d '["MANGO", 2]'
8
```

### Updating the RayCluster config

The process of updating the RayCluster config is the same as updating the Serve config.
For example, we can update the number of worker nodes to 2 in the manifest:

```console
workerGroupSpecs:
  # the number of pods in the worker group.
  - replicas: 2
```

```console
$ kubectl apply -f ray_v1alpha1_rayservice.yaml

$ kubectl describe rayservice rayservice-sample
...
  pendingServiceStatus:
    appStatus: {}
    dashboardStatus:
      healthLastUpdateTime: "2022-07-18T21:54:53Z"
      lastUpdateTime: "2022-07-18T21:54:54Z"
    rayClusterName: rayservice-sample-raycluster-bshfr
    rayClusterStatus: {}
...
```

In the status, you can see that the `RayService` is preparing a pending cluster.
After the pending cluster is healthy, it becomes the active cluster and the previous cluster is terminated.

## Next Steps

Check out [the end-to-end fault tolerance guide](serve-e2e-ft) to learn more about Serve's failure conditions and how to guard against them.

[KubeRay]: https://ray-project.github.io/kuberay/
[RayService]: https://ray-project.github.io/kuberay/guidance/rayservice/