slurm.rst 8.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204
  1. .. _ray-slurm-deploy:
  2. Deploying on Slurm
  3. ==================
  4. Slurm usage with Ray can be a little bit unintuitive.
  5. * SLURM requires multiple copies of the same program are submitted multiple times to the same cluster to do cluster programming. This is particularly well-suited for MPI-based workloads.
  6. * Ray, on the other hand, expects a head-worker architecture with a single point of entry. That is, you'll need to start a Ray head node, multiple Ray worker nodes, and run your Ray script on the head node.
  7. This document aims to clarify how to run Ray on SLURM.
  8. .. contents::
  9. :local:
  10. Walkthrough using Ray with SLURM
  11. --------------------------------
  12. Many SLURM deployments require you to interact with slurm via ``sbatch``, which executes a batch script on SLURM.
  13. To run a Ray job with ``sbatch``, you will want to start a Ray cluster in the sbatch job with multiple ``srun`` commands (tasks), and then execute your python script that uses Ray. Each task will run on a separate node and start/connect to a Ray runtime.
  14. The below walkthrough will do the following:
  15. 1. Set the proper headers for the ``sbatch`` script.
  16. 2. Load the proper environment/modules.
  17. 3. Fetch a list of available computing nodes and their IP addresses.
  18. 4. Launch a head ray process in one of the node (called the head node).
  19. 5. Launch Ray processes in (n-1) worker nodes and connects them to the head node by providing the head node address.
  20. 6. After the underlying ray cluster is ready, submit the user specified task.
  21. See :ref:`slurm-basic.sh <slurm-basic>` for an end-to-end example.
  22. .. _ray-slurm-headers:
  23. sbatch directives
  24. ~~~~~~~~~~~~~~~~~
  25. In your sbatch script, you'll want to add `directives to provide context <https://slurm.schedmd.com/sbatch.html>`__ for your job to SLURM.
  26. .. code-block:: bash
  27. #!/bin/bash
  28. #SBATCH --job-name=my-workload
  29. You'll need to tell SLURM to allocate nodes specifically for Ray. Ray will then find and manage all resources on each node.
  30. .. code-block:: bash
  31. ### Modify this according to your Ray workload.
  32. #SBATCH --nodes=4
  33. #SBATCH --exclusive
  34. Important: To ensure that each Ray worker runtime will run on a separate node, set ``tasks-per-node``.
  35. .. code-block:: bash
  36. #SBATCH --tasks-per-node=1
  37. Since we've set `tasks-per-node = 1`, this will be used to guarantee that each Ray worker runtime will obtain the
  38. proper resources. In this example, we ask for at least 5 CPUs and 5 GB of memory per node.
  39. .. code-block:: bash
  40. ### Modify this according to your Ray workload.
  41. #SBATCH --cpus-per-task=5
  42. #SBATCH --mem-per-cpu=1GB
  43. ### Similarly, you can also specify the number of GPUs per node.
  44. ### Modify this according to your Ray workload. Sometimes this
  45. ### should be 'gres' instead.
  46. #SBATCH --gpus-per-task=1
  47. You can also add other optional flags to your sbatch directives.
  48. Loading your environment
  49. ~~~~~~~~~~~~~~~~~~~~~~~~
  50. First, you'll often want to Load modules or your own conda environment at the beginning of the script.
  51. Note that this is an optional step, but it is often required for enabling the right set of dependencies.
  52. .. code-block:: bash
  53. # Example: module load pytorch/v1.4.0-gpu
  54. # Example: conda activate my-env
  55. conda activate my-env
  56. Obtain the head IP address
  57. ~~~~~~~~~~~~~~~~~~~~~~~~~~
  58. Next, we'll want to obtain a hostname and a node IP address for the head node. This way, when we start worker nodes, we'll be able to properly connect to the right head node.
  59. .. literalinclude:: /cluster/examples/slurm-basic.sh
  60. :language: bash
  61. :start-after: __doc_head_address_start__
  62. :end-before: __doc_head_address_end__
  63. Starting the Ray head node
  64. ~~~~~~~~~~~~~~~~~~~~~~~~~~
  65. After detecting the head node hostname and head node IP, we'll want to create
  66. a Ray head node runtime. We'll do this by using ``srun`` as a background task
  67. as a single task/node (recall that ``tasks-per-node=1``).
  68. Below, you'll see that we explicitly specify the number of CPUs (``num-cpus``)
  69. and number of GPUs (``num-gpus``) to Ray, as this will prevent Ray from using
  70. more resources than allocated. We also need to explictly
  71. indicate the ``node-ip-address`` for the Ray head runtime:
  72. .. literalinclude:: /cluster/examples/slurm-basic.sh
  73. :language: bash
  74. :start-after: __doc_head_ray_start__
  75. :end-before: __doc_head_ray_end__
  76. By backgrounding the above srun task, we can proceed to start the Ray worker runtimes.
  77. Starting the Ray worker nodes
  78. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  79. Below, we do the same thing, but for each worker. Make sure the Ray head and Ray worker processes are not started on the same node.
  80. .. literalinclude:: /cluster/examples/slurm-basic.sh
  81. :language: bash
  82. :start-after: __doc_worker_ray_start__
  83. :end-before: __doc_worker_ray_end__
  84. Submitting your script
  85. ~~~~~~~~~~~~~~~~~~~~~~
  86. Finally, you can invoke your Python script:
  87. .. literalinclude:: /cluster/examples/slurm-basic.sh
  88. :language: bash
  89. :start-after: __doc_script_start__
  90. Python-interface SLURM scripts
  91. ------------------------------
  92. [Contributed by @pengzhenghao] Below, we provide a helper utility (:ref:`slurm-launch.py <slurm-launch>`) to auto-generate SLURM scripts and launch.
  93. ``slurm-launch.py`` uses an underlying template (:ref:`slurm-template.sh <slurm-template>`) and fills out placeholders given user input.
  94. You can feel free to copy both files into your cluster for use. Feel free to also open any PRs for contributions to improve this script!
  95. Usage example
  96. ~~~~~~~~~~~~~
  97. If you want to utilize a multi-node cluster in slurm:
  98. .. code-block:: bash
  99. python slurm-launch.py --exp-name test --command "python your_file.py" --num-nodes 3
  100. If you want to specify the computing node(s), just use the same node name(s) in the same format of the output of ``sinfo`` command:
  101. .. code-block:: bash
  102. python slurm-launch.py --exp-name test --command "python your_file.py" --num-nodes 3 --node NODE_NAMES
  103. There are other options you can use when calling ``python slurm-launch.py``:
  104. * ``--exp-name``: The experiment name. Will generate ``{exp-name}_{date}-{time}.sh`` and ``{exp-name}_{date}-{time}.log``.
  105. * ``--command``: The command you wish to run. For example: ``rllib train XXX`` or ``python XXX.py``.
  106. * ``--num-gpus``: The number of GPUs you wish to use in each computing node. Default: 0.
  107. * ``--node`` (``-w``): The specific nodes you wish to use, in the same form as the output of ``sinfo``. Nodes are automatically assigned if not specified.
  108. * ``--num-nodes`` (``-n``): The number of nodes you wish to use. Default: 1.
  109. * ``--partition`` (``-p``): The partition you wish to use. Default: "", will use user's default partition.
  110. * ``--load-env``: The command to setup your environment. For example: ``module load cuda/10.1``. Default: "".
  111. Note that the :ref:`slurm-template.sh <slurm-template>` is compatible with both IPV4 and IPV6 ip address of the computing nodes.
  112. Implementation
  113. ~~~~~~~~~~~~~~
  114. Concretely, the (:ref:`slurm-launch.py <slurm-launch>`) does the following things:
  115. 1. It automatically writes your requirements, e.g. number of CPUs, GPUs per node, the number of nodes and so on, to a sbatch script name ``{exp-name}_{date}-{time}.sh``. Your command (``--command``) to launch your own job is also written into the sbatch script.
  116. 2. Then it will submit the sbatch script to slurm manager via a new process.
  117. 3. Finally, the python process will terminate itself and leaves a log file named ``{exp-name}_{date}-{time}.log`` to record the progress of your submitted command. At the mean time, the ray cluster and your job is running in the slurm cluster.
  118. Examples and templates
  119. ----------------------
  120. Here are some community-contributed templates for using SLURM with Ray:
  121. - `Ray sbatch submission scripts`_ used at `NERSC <https://www.nersc.gov/>`_, a US national lab.
  122. - `YASPI`_ (yet another slurm python interface) by @albanie. The goal of yaspi is to provide an interface to submitting slurm jobs, thereby obviating the joys of sbatch files. It does so through recipes - these are collections of templates and rules for generating sbatch scripts. Supports job submissions for Ray.
  123. - `Convenient python interface`_ to launch ray cluster and submit task by @pengzhenghao
  124. .. _`Ray sbatch submission scripts`: https://github.com/NERSC/slurm-ray-cluster
  125. .. _`YASPI`: https://github.com/albanie/yaspi
  126. .. _`Convenient python interface`: https://github.com/pengzhenghao/use-ray-with-slurm