openoker
/
ray


			
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617
							***********************
Deploying on Kubernetes
***********************

.. _ray-k8s-deploy:

Introduction
============
You can leverage your Kubernetes cluster as a substrate for execution of distributed Ray programs.
The Ray Autoscaler spins up and deletes Kubernetes pods according to resource demands of the Ray workload - each Ray node runs in its own Kubernetes pod.

Quick Guide
-----------

This document covers the following topics:

- :ref:`Overview of methods for launching a Ray Cluster on Kubernetes<k8s-overview>`
- :ref:`Managing clusters with the Ray Cluster Launcher<k8s-cluster-launcher>`
- :ref:`Managing clusters with the Ray Kubernetes Operator<k8s-operator>`
- :ref:`Interacting with a Ray Cluster via a Kubernetes Service<ray-k8s-interact>`
- :ref:`Comparison of the Ray Cluster Launcher and Ray Kubernetes Operator<k8s-comparison>`

You can find more information at the following links:

- :ref:`GPU usage with Kubernetes<k8s-gpus>`
- :ref:`Using Ray Tune on your Kubernetes cluster<tune-kubernetes>`
- :ref:`How to manually set up a non-autoscaling Ray cluster on Kubernetes<ray-k8s-static>`

.. _k8s-overview:

Ray on Kubernetes
=================

Ray supports two ways of launching an autoscaling Ray cluster on Kubernetes.

- Using the :ref:`Ray Cluster Launcher <k8s-cluster-launcher>`
- Using the :ref:`Ray Kubernetes Operator <k8s-operator>`

The Cluster Launcher and Ray Kubernetes Operator provide similar functionality; each serves as an `interface to the Ray autoscaler`.
Below is a brief overview of the two tools.

The Ray Cluster Launcher
------------------------
The :ref:`Ray Cluster Launcher <cluster-cloud>` is geared towards experimentation and development and can be used to launch Ray clusters on Kubernetes (among other backends).
It allows you to manage an autoscaling Ray Cluster from your local environment using the :ref:`Ray CLI <cluster-commands>`.
For example, you can use ``ray up`` to launch a Ray cluster on Kubernetes and ``ray exec`` to execute commands in the Ray head node's pod.
Note that using the Cluster Launcher requires Ray to be :ref:`installed locally <installation>`.

* Get started with the :ref:`Ray Cluster Launcher on Kubernetes<k8s-cluster-launcher>`.

The Ray Kubernetes Operator
---------------------------
The Ray Kubernetes Operator is a Kubernetes-native solution geared towards production use cases.
Rather than handling cluster launching locally, cluster launching and autoscaling are centralized in the Operator's Pod.
The Operator follows the standard Kubernetes `pattern <https://kubernetes.io/docs/concepts/extend-kubernetes/operator/>`__ - it runs
a control loop which manages a `Kubernetes Custom Resource`_ specifying the desired state of your Ray cluster.
Using the Kubernetes Operator does not require a local installation of Ray - all interactions with your Ray cluster are mediated by Kubernetes.

* Get started with the :ref:`Ray Kubernetes Operator<k8s-operator>`.


Further reading
---------------

Read :ref:`here<k8s-comparison>` for more details on the comparison between the Operator and Cluster Launcher.
Note that it is also possible to manually deploy a :ref:`non-autoscaling Ray cluster <ray-k8s-static>` on Kubernetes.

.. note::

  The configuration ``yaml`` files used in this document are provided in the `Ray repository`_
  as examples to get you started. When deploying real applications, you will probably
  want to build and use your own container images, add more worker nodes to the
  cluster, and change the resource requests for the head and worker nodes. Refer to the provided ``yaml``
  files to be sure that you maintain important configuration options for Ray to
  function properly.


.. _`Ray repository`: https://github.com/ray-project/ray/tree/master/python/ray/autoscaler/kubernetes

.. _k8s-cluster-launcher:

Managing Clusters with the Ray Cluster Launcher
===============================================

This section briefly explains how to use the Ray Cluster Launcher to launch a Ray cluster on your existing Kubernetes cluster.

First, install the Kubernetes API client (``pip install kubernetes``), then make sure your Kubernetes credentials are set up properly to access the cluster (if a command like ``kubectl get pods`` succeeds, you should be good to go).

Once you have ``kubectl`` configured locally to access the remote cluster, you should be ready to launch your cluster. The provided `ray/python/ray/autoscaler/kubernetes/example-full.yaml <https://github.com/ray-project/ray/tree/master/python/ray/autoscaler/kubernetes/example-full.yaml>`__ cluster config file will create a small cluster of one pod for the head node configured to autoscale up to two worker node pods, with all pods requiring 1 CPU and 0.5GiB of memory.

Test that it works by running the following commands from your local machine:

.. _cluster-launcher-commands:

.. code-block:: bash

    # Create or update the cluster. When the command finishes, it will print
    # out the command that can be used to get a remote shell into the head node.
    $ ray up ray/python/ray/autoscaler/kubernetes/example-full.yaml

    # List the pods running in the cluster. You shoud only see one head node
    # until you start running an application, at which point worker nodes
    # should be started. Don't forget to include the Ray namespace in your
    # 'kubectl' commands ('ray' by default).
    $ kubectl -n ray get pods

    # Get a remote screen on the head node.
    $ ray attach ray/python/ray/autoscaler/kubernetes/example-full.yaml
    $ # Try running a Ray program with 'ray.init(address="auto")'.

    # View monitor logs
    $ ray monitor ray/python/ray/autoscaler/kubernetes/example-full.yaml

    # Tear down the cluster
    $ ray down ray/python/ray/autoscaler/kubernetes/example-full.yaml

* Learn about :ref:`running Ray programs on Kubernetes <ray-k8s-run>`

.. _k8s-operator:

Managing clusters with the Ray Kubernetes Operator
==================================================

.. role:: bash(code)
   :language: bash

This section explains how to use the Ray Kubernetes Operator to launch a Ray cluster on your existing Kubernetes cluster.

The example commands in this document launch six Kubernetes pods, using a total of 6 CPU and 3.5Gi memory.
If you are experimenting using a test Kubernetes environment such as `minikube`_, make sure to provision sufficient resources, e.g.
:bash:`minikube start --cpus=6 --memory=\"4G\"`.
Alternatively, reduce resource usage by editing the ``yaml`` files referenced in this document; for example, reduce ``minWorkers``
in ``example_cluster.yaml`` and ``example_cluster2.yaml``.

.. note::

   1. The Ray Kubernetes Operator is still experimental. For the yaml files in the examples below, we recommend using the latest master version of Ray.
   2. The Ray Kubernetes Operator requires Kubernetes version at least ``v1.17.0``. Check Kubernetes version info with the command :bash:`kubectl version`.


Applying the RayCluster Custom Resource Definition
--------------------------------------------------
The Ray Kubernetes operator works by managing a user-submitted `Kubernetes Custom Resource`_ (CR) called a ``RayCluster``.
A RayCluster custom resource describes the desired state of the Ray cluster.

To get started, we need to apply the `Kubernetes Custom Resource Definition`_ (CRD) defining a RayCluster.


.. code-block:: shell

 $ kubectl apply -f ray/python/ray/autoscaler/kubernetes/operator_configs/cluster_crd.yaml

 customresourcedefinition.apiextensions.k8s.io/rayclusters.cluster.ray.io created

.. note::

    The file ``cluster_crd.yaml`` defining the CRD is not meant to meant to be modified by the user. Rather, users :ref:`configure <operator-launch>` a RayCluster CR via a file like `ray/python/ray/autoscaler/kubernetes/operator_configs/example_cluster.yaml <https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/kubernetes/operator_configs/example_cluster.yaml>`__.
    The Kubernetes API server then validates the user-submitted RayCluster resource against the CRD.

Picking a Kubernetes Namespace
-------------------------------
The rest of the Kubernetes resources we will use are `namespaced`_.
You can use an existing namespace for your Ray clusters or create a new one if you have permissions.
For this example, we will create a namespace called ``ray``.

.. code-block:: shell

 $ kubectl create namespace ray

 namespace/ray created

Starting the Operator
----------------------

To launch the operator in our namespace, we execute the following command.

.. code-block:: shell

 $ kubectl -n ray apply -f ray/python/ray/autoscaler/kubernetes/operator_configs/operator.yaml

 serviceaccount/ray-operator-serviceaccount created
 role.rbac.authorization.k8s.io/ray-operator-role created
 rolebinding.rbac.authorization.k8s.io/ray-operator-rolebinding created
 pod/ray-operator-pod created

The output shows that we've launched a Pod named ``ray-operator-pod``. This is the pod that runs the operator process.
The ServiceAccount, Role, and RoleBinding we have created grant the operator pod the `permissions`_ it needs to manage Ray clusters.

.. _operator-launch:

Launching Ray Clusters
----------------------
Finally, to launch a Ray cluster, we create a RayCluster custom resource.

.. code-block:: shell

 $ kubectl -n ray apply -f ray/python/ray/autoscaler/kubernetes/operator_configs/example_cluster.yaml

 raycluster.cluster.ray.io/example-cluster created

The operator detects the RayCluster resource we've created and launches an autoscaling Ray cluster.
Our RayCluster configuration specifies ``minWorkers:2`` in the second entry of ``spec.podTypes``, so we get a head node and two workers upon launch.

.. note::

  For more details about RayCluster resources, we recommend take a looking at the annotated example `example_cluster.yaml <https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/kubernetes/operator_configs/example_cluster.yaml>`__  applied in the last command.

.. code-block:: shell

 $ kubectl -n ray get pods
 NAME                               READY   STATUS    RESTARTS   AGE
 example-cluster-ray-head-hbxvv     1/1     Running   0          72s
 example-cluster-ray-worker-4hvv6   1/1     Running   0          64s
 example-cluster-ray-worker-78kp5   1/1     Running   0          64s
 ray-operator-pod                   1/1     Running   0          2m33s

We see four pods: the operator, the Ray head node, and two Ray worker nodes.

Let's launch another cluster in the same namespace, this one specifiying ``minWorkers:1``.

.. code-block:: shell

 $ kubectl -n ray apply -f ray/python/ray/autoscaler/kubernetes/operator_configs/example_cluster2.yaml

We confirm that both clusters are running in our namespace.

.. code-block:: shell

 $ kubectl -n ray get rayclusters
 NAME               STATUS    AGE
 example-cluster    Running   19s
 example-cluster2   Running   19s


 $ kubectl -n ray get pods
 NAME                                READY   STATUS    RESTARTS   AGE
 example-cluster-ray-head-th4wv      1/1     Running   0          10m
 example-cluster-ray-worker-q9pjn    1/1     Running   0          10m
 example-cluster-ray-worker-qltnp    1/1     Running   0          10m
 example-cluster2-ray-head-kj5mg     1/1     Running   0          10s
 example-cluster2-ray-worker-qsgnd   1/1     Running   0          1s
 ray-operator-pod                    1/1     Running   0          10m

Now we can :ref:`run Ray programs<ray-k8s-run>` on our Ray clusters.

.. _operator-logs:

Monitoring
----------
Autoscaling logs are written to the operator pod's ``stdout`` and can be accessed with :code:`kubectl logs`.
Each line of output is prefixed by the name of the cluster followed by a colon.
The following command gets the last hundred lines of autoscaling logs for our second cluster.

.. code-block:: shell

 $ kubectl -n ray logs ray-operator-pod | grep ^example-cluster2: | tail -n 100

The output should include monitoring updates that look like this:

.. code-block:: shell

    example-cluster2:2020-12-12 13:55:36,814        DEBUG autoscaler.py:693 -- Cluster status: 1 nodes
    example-cluster2: - MostDelayedHeartbeats: {'172.17.0.4': 0.04093289375305176, '172.17.0.5': 0.04084634780883789}
    example-cluster2: - NodeIdleSeconds: Min=36 Mean=38 Max=41
    example-cluster2: - ResourceUsage: 0.0/2.0 CPU, 0.0/1.0 Custom1, 0.0/1.0 is_spot, 0.0 GiB/0.58 GiB memory, 0.0 GiB/0.1 GiB object_store_memory
    example-cluster2: - TimeSinceLastHeartbeat: Min=0 Mean=0 Max=0
    example-cluster2:Worker node types:
    example-cluster2: - worker-nodes: 1
    example-cluster2:2020-12-12 13:55:36,870        INFO resource_demand_scheduler.py:148 -- Cluster resources: [{'object_store_memory': 1.0, 'node:172.17.0.4': 1.0, 'memory': 5.0, 'CPU': 1.0}, {'object_store_memory': 1.0, 'is_spot': 1.0, 'memory': 6.0, 'node:172.17.0.5': 1.0, 'Custom1': 1.0, 'CPU': 1.0}]
    example-cluster2:2020-12-12 13:55:36,870        INFO resource_demand_scheduler.py:149 -- Node counts: defaultdict(<class 'int'>, {'head-node': 1, 'worker-nodes
    ': 1})
    example-cluster2:2020-12-12 13:55:36,870        INFO resource_demand_scheduler.py:159 -- Placement group demands: []
    example-cluster2:2020-12-12 13:55:36,870        INFO resource_demand_scheduler.py:186 -- Resource demands: []
    example-cluster2:2020-12-12 13:55:36,870        INFO resource_demand_scheduler.py:187 -- Unfulfilled demands: []
    example-cluster2:2020-12-12 13:55:36,891        INFO resource_demand_scheduler.py:209 -- Node requests: {}
    example-cluster2:2020-12-12 13:55:36,903        DEBUG autoscaler.py:654 -- example-cluster2-ray-worker-tdxdr is not being updated and passes config check (can_update=True).
    example-cluster2:2020-12-12 13:55:36,923        DEBUG autoscaler.py:654 -- example-cluster2-ray-worker-tdxdr is not being updated and passes config check (can_update=True).

Cleaning Up
-----------
We shut down a Ray cluster by deleting the associated RayCluster resource.
Either of the next two commands will delete our second cluster ``example-cluster2``.

.. code-block:: shell

 $ kubectl -n ray delete raycluster example-cluster2
 # OR
 $ kubectl -n ray delete -f ray/python/ray/autoscaler/kubernetes/operator_configs/example_cluster2.yaml

The pods associated with ``example-cluster2``  go into the ``TERMINATING`` phase. In a few moments, we check that these pods are gone:

.. code-block:: shell

 $ kubectl -n ray get pods
 NAME                               READY   STATUS    RESTARTS   AGE
 example-cluster-ray-head-th4wv     1/1     Running   0          57m
 example-cluster-ray-worker-q9pjn   1/1     Running   0          56m
 example-cluster-ray-worker-qltnp   1/1     Running   0          56m
 ray-operator-pod                   1/1     Running   0          57m

Only the operator pod and the first ``example-cluster`` remain.

To finish clean-up, we delete the cluster ``example-cluster`` and then the operator's resources.

.. code-block:: shell

 $ kubectl -n ray delete raycluster example-cluster
 $ kubectl -n ray delete -f ray/python/ray/autoscaler/kubernetes/operator_configs/operator.yaml

If you like, you can delete the RayCluster customer resource definition.
(Using the operator again will then require reapplying the CRD.)

.. code-block:: shell

 $ kubectl delete crd rayclusters.cluster.ray.io
 # OR
 $ kubectl delete -f ray/python/ray/autoscaler/kubernetes/operator_configs/cluster_crd.yaml


.. _ray-k8s-interact:

Interacting with a Ray Cluster
==============================
:ref:`Ray Client <ray-client>` allows you to connect to your Ray cluster on Kubernetes and execute Ray programs.
The Ray Client server runs the Ray head node, by default on port 10001.

:ref:`Ray Dashboard <ray-dashboard>` gives visibility into the state of your cluster.
By default, the dashboard uses port 8265 on the Ray head node.

.. _k8s-service:

Configuring a head node service
-------------------------------
To use Ray Client and Ray Dashboard,
you can connect via a `Kubernetes Service`_ targeting the relevant ports on the head node:

.. _svc-example:

.. code-block:: yaml

    apiVersion: v1
    kind: Service
    metadata:
        name: example-cluster-ray-head
    spec:
        # This selector must match the head node pod's selector.
        selector:
            component: example-cluster-ray-head
        ports:
            - name: client
              protocol: TCP
              port: 10001
              targetPort: 10001
            - name: dashboard
              protocol: TCP
              port: 8265
              targetPort: 8265


The head node pod's ``metadata`` should have a ``label`` matching the service's ``selector`` field:

.. code-block:: yaml

    apiVersion: v1
    kind: Pod
    metadata:
      # Automatically generates a name for the pod with this prefix.
      generateName: example-cluster-ray-head-
      # Must match the head node service selector above if a head node
      # service is required.
      labels:
          component: example-cluster-ray-head

- The Ray Kubernetes Operator automatically configures a default service exposing ports 10001 and 8265 \
  on the head node pod. The Operator also adds the relevant label to the head node pod's configuration. \
  If this default service does not suit your use case, you can modify the service or create a new one, \
  for example by using the tools ``kubectl edit``, ``kubectl create``, or ``kubectl apply``.

- The Ray Cluster launcher does not automatically configure a service targeting the head node. A \
  head node service can be specified in the cluster launching config's ``provider.services`` field. The example cluster lauching \
  config `example-full.yaml <https://github.com/ray-project/ray/tree/master/python/ray/autoscaler/kubernetes/example-full.yaml>`__ includes \
  the :ref:`above <svc-example>` service configuration as an example.

After launching a Ray cluster with either the Operator or Cluster Launcher, you can view the configured service:

.. code-block:: shell

 $ kubectl -n ray get services

  NAME                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)              AGE
  example-cluster-ray-head   ClusterIP   10.106.123.159   <none>        10001/TCP,8265/TCP   52s

.. _ray-k8s-run:

Running Ray Programs
--------------------
Given a running Ray cluster and a :ref:`Service <k8s-service>` exposing the Ray Client server's port on the head pod,
we can now run Ray programs on our cluster.

In the following examples, we assume that we have a running Ray cluster with one head node and
two worker nodes. This can be achieved in one of two ways:

- Using the :ref:`Operator <k8s-operator>` with the example resource `ray/python/ray/autoscaler/kubernetes/operator_configs/example_cluster.yaml <https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/kubernetes/operator_configs/example_cluster.yaml>`__.
- Using :ref:`Cluster Launcher <k8s-cluster-launcher>`. Modify the example file `ray/python/ray/autoscaler/kubernetes/example-full.yaml <https://github.com/ray-project/ray/tree/master/python/ray/autoscaler/kubernetes/example-full.yaml>`__
  by setting the field ``available_node_types.worker_node.min_workers``
  to 2 and then run ``ray up`` with the modified config.


Using Ray Client to connect from within the Kubernetes cluster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can connect to your Ray cluster from another pod in the same Kubernetes cluster.

For example, you can submit a Ray application to run on the Kubernetes cluster as a `Kubernetes
Job`_. The Job will run a single pod running the Ray driver program to
completion, then terminate the pod but allow you to access the logs.

The following command submits a Job which executes an `example Ray program`_.

.. code-block:: yaml

  $ kubectl create -f ray/doc/kubernetes/job-example.yaml

The program executed by the Job waits for three Ray nodes to connect and then tests object transfer
between the nodes. Note that the program uses the environment variables
``EXAMPLE_CLUSTER_RAY_HEAD_SERVICE_HOST`` and ``EXAMPLE_CLUSTER_RAY_HEAD_SERVICE_PORT_CLIENT``
to access Ray Client. These `environment variables`_ are set by Kubernetes based on
the service we are using to expose the Ray head node.

To view the output of the Job, first find the name of the pod that ran it,
then fetch its logs:

.. code-block:: shell

  $ kubectl -n ray get pods
  NAME                               READY   STATUS    RESTARTS   AGE
  example-cluster-ray-head-rpqfb     1/1     Running   0          11m
  example-cluster-ray-worker-4c7cn   1/1     Running   0          11m
  example-cluster-ray-worker-zvglb   1/1     Running   0          11m
  ray-test-job-8x2pm-77lb5           1/1     Running   0          8s

  # Fetch the logs. You should see repeated output for 10 iterations and then
  # 'Success!'
  $ kubectl -n ray logs ray-test-job-8x2pm-77lb5

To clean up the resources created by the Job after checking its output, run
the following:

.. code-block:: shell

  # List Jobs run in the Ray namespace.
  $ kubectl -n ray get jobs
  NAME                 COMPLETIONS   DURATION   AGE
  ray-test-job-kw5gn   1/1           10s        30s

  # Delete the finished Job.
  $ kubectl -n ray delete job ray-test-job-kw5gn

  # Verify that the Job's pod was cleaned up.
  $ kubectl -n ray get pods
  NAME                               READY   STATUS    RESTARTS   AGE
  example-cluster-ray-head-rpqfb     1/1     Running   0          11m
  example-cluster-ray-worker-4c7cn   1/1     Running   0          11m
  example-cluster-ray-worker-zvglb   1/1     Running   0          11m

.. _`environment variables`: https://kubernetes.io/docs/concepts/services-networking/service/#environment-variables
.. _`example Ray program`: https://github.com/ray-project/ray/blob/master/doc/kubernetes/example_scripts/job_example.py


Using Ray Client to connect from outside the Kubernetes cluster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To connect to the Ray cluster from outside your Kubernetes cluster,
the head node Service needs to communicate with the outside world.

One way to achieve this is by port-forwarding.
Run the following command locally:

.. code-block:: shell

  $ kubectl -n ray port-forward service/example-cluster-ray-head 10001:10001

`Alternatively`, you can find the head node pod and connect to it directly with
the following command:

.. code-block:: shell

  # Substitute the name of your Ray cluster if using a name other than "example-cluster".
  $ kubectl -n ray port-forward \
    $(kubectl -n ray get pods -l ray-cluster-name=example-cluster -l  ray-node-type=head -o custom-columns=:metadata.name) 10001:10001

Then open a new shell and try out a sample program:

.. code-block:: shell

  $ python ray/doc/kubernetes/example_scripts/run_local_example.py

The program in this example uses ``ray.util.connect(127.0.0.1:10001)`` to connect to the Ray cluster.

.. note::

  Connecting with Ray client requires using the matching minor versions of Python (for example 3.7)
  on the server and client end -- that is on the Ray head node and in the environment where
  ``ray.util.connect`` is invoked. Note that the default ``rayproject/ray`` images use Python 3.7.
  Nightly builds are now available for Python 3.6 and 3.8 at the `Ray Docker Hub <https://hub.docker.com/r/rayproject/ray/tags?page=1&ordering=last_updated&name=nightly-py>`_.

  Connecting with Ray client currently also requires matching Ray versions. In particular, to connect from a local machine to a cluster running the examples in this document, the :ref:`nightly <install-nightlies>` version of Ray must be installed locally.

Running the program on the head node
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It is also possible to execute a Ray program on the Ray head node.
(Replace the pod name with the name of your head pod
- you can find it by running ``kubectl -n ray get pods``.)

.. code-block:: shell

 # Copy the test script onto the head node.
 $ kubectl -n ray cp ray/doc/kubernetes/example_scripts/run_on_head.py example-cluster-ray-head-p9mfh:/home/ray

 # Run the example program on the head node.
 $ kubectl -n ray exec example-cluster-ray-head-p9mfh -- python /home/ray/run_on_head.py
 # You should see repeated output for 10 iterations and then 'Success!'


Alternatively, you can run tasks interactively on the cluster by connecting a remote
shell to one of the pods.

.. code-block:: shell

  # Get a remote shell to the head node.
  $ kubectl -n ray exec -it example-cluster-ray-head-5455bb66c9-7l6xj -- bash

  # Run the example program on the head node.
  root@ray-head-6f566446c-5rdmb:/# python /home/ray/run_on_head.py
  # You should see repeated output for 10 iterations and then 'Success!'


The program in this example uses ``ray.init(address="auto")`` to connect to the Ray cluster.

Accessing the Dashboard
-----------------------

The Ray Dashboard can be accessed locally using ``kubectl port-forward``.

.. code-block:: shell

  $ kubectl -n ray port-forward service/example-cluster-ray-head 8265:8265

After running the above command locally, the Dashboard will be accessible at ``http://localhost:8265``.

You can also monitor the state of the cluster with ``kubectl logs`` when using the :ref:`Operator <operator-logs>` or with ``ray monitor`` when using
the :ref:`Ray Cluster Launcher <cluster-launcher-commands>`.

.. warning::
   The Dashboard currently shows resource limits of the physical host each Ray node is running on,
   rather than the limits of the container the node is running in.
   This is a known bug tracked `here <https://github.com/ray-project/ray/issues/11172>`_.


.. _k8s-comparison:

Cluster Launcher vs Operator
============================

We compare the Ray Cluster Launcher and Ray Kubernetes Operator as methods of managing an autoscaling Ray cluster.


Comparison of use cases
-----------------------

- The Cluster Launcher is convenient for development and experimentation. Using the Cluster Launcher requires a local installation of Ray. The Ray CLI then provides a convenient interface for interacting with a Ray cluster.

- The Operator is geared towards production use cases. It does not require installing Ray locally - all interactions with your Ray cluster are mediated by Kubernetes.


Comparison of architectures
---------------------------

- With the Cluster Launcher, the user launches a Ray cluster from their local environment by invoking ``ray up``. This provisions a pod for the Ray head node, which then runs the `autoscaling process <https://github.com/ray-project/ray/blob/master/python/ray/monitor.py>`__.

-  The `Operator <https://github.com/ray-project/ray/blob/master/python/ray/ray_operator/operator.py>`__ centralizes cluster launching and autoscaling in the `Operator pod <https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/kubernetes/operator_configs/operator.yaml>`__. \
   The user creates a `Kubernetes Custom Resource`_ describing the intended state of the Ray cluster. \
   The Operator then detects the resource, launches a Ray cluster, and runs the autoscaling process in the operator pod. \
   The Operator can manage multiple Ray clusters by running an autoscaling process for each Ray cluster.

Comparison of configuration options
-----------------------------------

The configuration options for the two methods are completely analogous - compare sample configurations for the `Cluster Launcher <https://github.com/ray-project/ray/tree/master/python/ray/autoscaler/kubernetes/example-full.yaml>`__
and for the `Operator <https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/kubernetes/operator_configs/example_cluster.yaml>`__.
With a few exceptions, the fields of the RayCluster resource managed by the Operator are camelCase versions of the corresponding snake_case Cluster Launcher fields.
In fact, the Operator `internally <https://github.com/ray-project/ray/blob/master/python/ray/ray_operator/operator_utils.py>`__ converts
RayCluster resources to Cluster Launching configs.

A summary of the configuration differences:

- The Cluster Launching field ``available_node_types`` for specifiying the types of pods available for autoscaling is renamed to ``podTypes`` in the Operator's RayCluster configuration.
- The Cluster Launching field ``resources`` for specifying custom Ray resources provided by a node type is renamed to ``rayResources`` in the Operator's RayCluster configuration.
- The ``provider`` field in the Cluster Launching config has no analogue in the Operator's RayCluster configuration. (The Operator fills this field internally.)
-  * When using the Cluster Launcher, ``head_ray_start_commands`` should include the argument ``--autoscaling-config=~/ray_bootstrap_config.yaml``; this is important for the configuration of the head node's autoscaler.
   * On the other hand, the Operator's ``headRayStartCommands`` should include a ``--no-monitor`` flag to prevent the autoscaling/monitoring process from running on the head node.

Questions or Issues?
--------------------

.. include:: /_help.rst


.. _`Kubernetes Job`: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
.. _`Kubernetes Service`: https://kubernetes.io/docs/concepts/services-networking/service/
.. _`Kubernetes Operator`: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
.. _`Kubernetes Custom Resource`: https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/
.. _`Kubernetes Custom Resource Definition`: https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/
.. _`annotation`: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/#attaching-metadata-to-objects
.. _`permissions`: https://kubernetes.io/docs/reference/access-authn-authz/rbac/
.. _`minikube`: https://minikube.sigs.k8s.io/docs/start/
.. _`namespaced`: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/