index.rst 10 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261
  1. .. include:: /_includes/rllib/announcement.rst
  2. .. include:: /_includes/rllib/we_are_hiring.rst
  3. .. _rllib-index:
  4. RLlib: Industry-Grade Reinforcement Learning
  5. ============================================
  6. .. image:: images/rllib-logo.png
  7. :align: center
  8. **RLlib** is an open-source library for reinforcement learning (RL),
  9. offering support for
  10. production-level, highly distributed RL workloads while maintaining
  11. unified and simple APIs for a large variety of industry applications.
  12. Whether you would like to train your agents in a **multi-agent** setup,
  13. purely from **offline** (historic) datasets, or using **externally
  14. connected simulators**, RLlib offers a simple solution for each of your decision
  15. making needs.
  16. If you either have your problem coded (in python) as an
  17. `RL environment <rllib-env.html#configuring-environments>`_
  18. or own lots of pre-recorded, historic behavioral data to learn from, you will be
  19. up and running in only a few days.
  20. RLlib is already used in production by industry leaders in many different verticals,
  21. such as
  22. `climate control <https://www.anyscale.com/events/2021/06/23/applying-ray-and-rllib-to-real-life-industrial-use-cases>`_,
  23. `industrial control <https://www.anyscale.com/events/2021/06/23/applying-ray-and-rllib-to-real-life-industrial-use-cases>`_,
  24. `manufacturing and logistics <https://www.anyscale.com/events/2022/03/29/alphadow-leveraging-rays-ecosystem-to-train-and-deploy-an-rl-industrial>`_,
  25. `finance <https://www.anyscale.com/events/2021/06/22/a-24x-speedup-for-reinforcement-learning-with-rllib-+-ray>`_,
  26. `gaming <https://www.anyscale.com/events/2021/06/22/using-reinforcement-learning-to-optimize-iap-offer-recommendations-in-mobile-games>`_,
  27. `automobile <https://www.anyscale.com/events/2021/06/23/using-rllib-in-an-enterprise-scale-reinforcement-learning-solution>`_,
  28. `robotics <https://www.anyscale.com/events/2021/06/23/introducing-amazon-sagemaker-kubeflow-reinforcement-learning-pipelines-for>`_,
  29. `boat design <https://www.youtube.com/watch?v=cLCK13ryTpw>`_,
  30. and many others.
  31. RLlib in 60 seconds
  32. -------------------
  33. .. figure:: images/rllib-index-header.svg
  34. It only takes a few steps to get your first RLlib workload
  35. up and running on your laptop.
  36. RLlib does not automatically install a deep-learning framework, but supports
  37. **TensorFlow** (both 1.x with static-graph and 2.x with eager mode) as well as
  38. **PyTorch**.
  39. Depending on your needs, make sure to install either TensorFlow or
  40. PyTorch (or both, as shown below):
  41. .. raw:: html
  42. <div class="termynal" data-termynal>
  43. <span data-ty="input">pip install "ray[rllib]" tensorflow torch</span>
  44. </div>
  45. .. margin::
  46. For installation on computers running Apple Silicon (such as M1), please follow instructions
  47. `here <https://docs.ray.io/en/latest/installation.html#m1-mac-apple-silicon-support>`._
  48. To be able to run our Atari examples, you should also install
  49. `pip install "gym[atari]" "gym[accept-rom-license]" atari_py`.
  50. This is all you need to start coding against RLlib.
  51. Here is an example of running a PPO Algorithm on the
  52. `Taxi domain <https://www.gymlibrary.dev/environments/toy_text/taxi/>`_.
  53. We first create a `config` for the algorithm, which sets the right environment, and
  54. defines all training parameters we want.
  55. Next, we `build` the algorithm and `train` it for a total of `5` iterations.
  56. A training iteration includes parallel sample collection by the environment workers, as well as loss calculation on the collected batch and a model update.
  57. As a last step, we `evaluate` the trained Algorithm:
  58. .. literalinclude:: doc_code/rllib_in_60s.py
  59. :language: python
  60. :start-after: __rllib-in-60s-begin__
  61. :end-before: __rllib-in-60s-end__
  62. Note that you can use any Farama-Foundation Gymnasium environment as `env`.
  63. In `rollouts` you can for instance specify the number of parallel workers to collect samples from the environment.
  64. The `framework` config lets you choose between "tf2", "tf" and "torch" for execution.
  65. You can also tweak RLlib's default `model` config,and set up a separate config for `evaluation`.
  66. If you want to learn more about the RLlib training API,
  67. `you can learn more about it here <rllib-training.html#using-the-python-api>`_.
  68. Also, see `here for a simple example on how to write an action inference loop after training. <https://github.com/ray-project/ray/blob/master/rllib/examples/inference_and_serving/policy_inference_after_training.py>`_
  69. If you want to get a quick preview of which **algorithms** and **environments** RLlib supports,
  70. click on the dropdowns below:
  71. .. dropdown:: **RLlib Algorithms**
  72. :animate: fade-in-slide-down
  73. * High-throughput architectures
  74. - |pytorch| |tensorflow| :ref:`Distributed Prioritized Experience Replay (Ape-X) <apex>`
  75. - |pytorch| |tensorflow| :ref:`Importance Weighted Actor-Learner Architecture (IMPALA) <impala>`
  76. - |pytorch| |tensorflow| :ref:`Asynchronous Proximal Policy Optimization (APPO) <appo>`
  77. - |pytorch| :ref:`Decentralized Distributed Proximal Policy Optimization (DD-PPO) <ddppo>`
  78. * Gradient-based
  79. - |pytorch| |tensorflow| :ref:`Advantage Actor-Critic (A2C, A3C) <a3c>`
  80. - |pytorch| |tensorflow| :ref:`Deep Deterministic Policy Gradients (DDPG, TD3) <ddpg>`
  81. - |pytorch| |tensorflow| :ref:`Deep Q Networks (DQN, Rainbow, Parametric DQN) <dqn>`
  82. - |pytorch| |tensorflow| :ref:`Policy Gradients <pg>`
  83. - |pytorch| |tensorflow| :ref:`Proximal Policy Optimization (PPO) <ppo>`
  84. - |pytorch| |tensorflow| :ref:`Soft Actor Critic (SAC) <sac>`
  85. - |pytorch| :ref:`Slate Q-Learning (SlateQ) <slateq>`
  86. * Derivative-free
  87. - |pytorch| |tensorflow| :ref:`Augmented Random Search (ARS) <ars>`
  88. - |pytorch| |tensorflow| :ref:`Evolution Strategies <es>`
  89. * Model-based / Meta-learning / Offline
  90. - |pytorch| :ref:`Single-Player AlphaZero (AlphaZero) <alphazero>`
  91. - |pytorch| |tensorflow| :ref:`Model-Agnostic Meta-Learning (MAML) <maml>`
  92. - |pytorch| :ref:`Model-Based Meta-Policy-Optimization (MBMPO) <mbmpo>`
  93. - |pytorch| :ref:`Dreamer (DREAMER) <dreamer>`
  94. - |pytorch| :ref:`Conservative Q-Learning (CQL) <cql>`
  95. * Multi-agent
  96. - |pytorch| :ref:`QMIX Monotonic Value Factorisation (QMIX, VDN, IQN) <qmix>`
  97. - |tensorflow| :ref:`Multi-Agent Deep Deterministic Policy Gradient (MADDPG) <maddpg>`
  98. * Offline
  99. - |pytorch| |tensorflow| :ref:`Advantage Re-Weighted Imitation Learning (MARWIL) <marwil>`
  100. * Contextual bandits
  101. - |pytorch| :ref:`Linear Upper Confidence Bound (LinUCB) <lin-ucb>`
  102. - |pytorch| :ref:`Linear Thompson Sampling (LinTS) <lints>`
  103. * Exploration-based plug-ins (can be combined with any algo)
  104. - |pytorch| :ref:`Curiosity (ICM: Intrinsic Curiosity Module) <curiosity>`
  105. .. dropdown:: **RLlib Environments**
  106. :animate: fade-in-slide-down
  107. * `RLlib Environments Overview <rllib-env.html>`__
  108. * `Farama-Foundation gymnasium <rllib-env.html#gymnasium>`__
  109. * `Vectorized <rllib-env.html#vectorized>`__
  110. * `Multi-Agent and Hierarchical <rllib-env.html#multi-agent-and-hierarchical>`__
  111. * `External Agents and Applications <rllib-env.html#external-agents-and-applications>`__
  112. - `External Application Clients <rllib-env.html#external-application-clients>`__
  113. * `Advanced Integrations <rllib-env.html#advanced-integrations>`__
  114. Feature Overview
  115. ----------------
  116. .. grid:: 1 2 3 3
  117. :gutter: 1
  118. :class-container: container pb-4
  119. .. grid-item-card::
  120. **RLlib Key Concepts**
  121. ^^^
  122. Learn more about the core concepts of RLlib, such as environments, algorithms and
  123. policies.
  124. +++
  125. .. button-ref:: rllib-core-concepts
  126. :color: primary
  127. :outline:
  128. :expand:
  129. Key Concepts
  130. .. grid-item-card::
  131. **RLlib Algorithms**
  132. ^^^
  133. Check out the many available RL algorithms of RLlib for model-free and model-based
  134. RL, on-policy and off-policy training, multi-agent RL, and more.
  135. +++
  136. .. button-ref:: rllib-algorithms-doc
  137. :color: primary
  138. :outline:
  139. :expand:
  140. Algorithms
  141. .. grid-item-card::
  142. **RLlib Environments**
  143. ^^^
  144. Get started with environments supported by RLlib, such as Farama foundation's Gymnasium, Petting Zoo,
  145. and many custom formats for vectorized and multi-agent environments.
  146. +++
  147. .. button-ref:: rllib-environments-doc
  148. :color: primary
  149. :outline:
  150. :expand:
  151. Environments
  152. The following is a summary of RLlib's most striking features.
  153. Click on the images below to see an example script for each of the listed features:
  154. .. include:: feature_overview.rst
  155. Customizing RLlib
  156. -----------------
  157. RLlib provides simple APIs to customize all aspects of your training- and experimental workflows.
  158. For example, you may code your own `environments <rllib-env.html#configuring-environments>`__
  159. in python using Farama-Foundation's gymnasium or DeepMind's OpenSpiel, provide custom
  160. `TensorFlow/Keras- <rllib-models.html#tensorflow-models>`__ or ,
  161. `Torch models <rllib-models.html#torch-models>`_, write your own
  162. `policy- and loss definitions <rllib-concepts.html#policies>`__, or define
  163. custom `exploratory behavior <rllib-training.html#exploration-api>`_.
  164. Via mapping one or more agents in your environments to (one or more) policies, multi-agent
  165. RL (MARL) becomes an easy-to-use low-level primitive for our users.
  166. .. figure:: images/rllib-stack.svg
  167. :align: left
  168. :width: 650
  169. **RLlib's API stack:** Built on top of Ray, RLlib offers off-the-shelf, highly distributed
  170. algorithms, policies, loss functions, and default models (including the option to
  171. auto-wrap a neural network with an LSTM or an attention net). Furthermore, our library
  172. comes with a built-in Server/Client setup, allowing you to connect
  173. hundreds of external simulators (clients) via the network to an RLlib server process,
  174. which provides learning functionality and serves action queries. User customizations
  175. are realized via sub-classing the existing abstractions and - by overriding certain
  176. methods in those sub-classes - define custom behavior.
  177. .. |tensorflow| image:: images/tensorflow.png
  178. :class: inline-figure
  179. :width: 16
  180. .. |pytorch| image:: images/pytorch.png
  181. :class: inline-figure
  182. :width: 16