README.rst 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249
  1. RLlib: Industry-Grade Reinforcement Learning with TF and Torch
  2. ==============================================================
  3. **RLlib** is an open-source library for reinforcement learning (RL), offering support for
  4. production-level, highly distributed RL workloads, while maintaining
  5. unified and simple APIs for a large variety of industry applications.
  6. Whether you would like to train your agents in multi-agent setups,
  7. purely from offline (historic) datasets, or using externally
  8. connected simulators, RLlib offers simple solutions for your decision making needs.
  9. You **don't need** to be an **RL expert** to use RLlib, nor do you need to learn Ray or any
  10. other of its libraries! If you either have your problem coded (in python) as an
  11. `RL environment <https://medium.com/distributed-computing-with-ray/anatomy-of-a-custom-environment-for-rllib-327157f269e5>`_
  12. or own lots of pre-recorded, historic behavioral data to learn from, you will be
  13. up and running in only a few days.
  14. RLlib is already used in production by industry leaders in many different verticals, such as
  15. `climate control <https://www.anyscale.com/events/2021/06/23/applying-ray-and-rllib-to-real-life-industrial-use-cases>`_,
  16. `manufacturing and logistics <https://www.anyscale.com/events/2021/06/22/offline-rl-with-rllib>`_,
  17. `finance <https://www.anyscale.com/events/2021/06/22/a-24x-speedup-for-reinforcement-learning-with-rllib-+-ray>`_,
  18. `gaming <https://www.anyscale.com/events/2021/06/22/using-reinforcement-learning-to-optimize-iap-offer-recommendations-in-mobile-games>`_,
  19. `automobile <https://www.anyscale.com/events/2021/06/23/using-rllib-in-an-enterprise-scale-reinforcement-learning-solution>`_,
  20. `robotics <https://www.anyscale.com/events/2021/06/23/introducing-amazon-sagemaker-kubeflow-reinforcement-learning-pipelines-for>`_,
  21. `boat design <https://www.youtube.com/watch?v=cLCK13ryTpw>`_,
  22. and many others.
  23. Installation and Setup
  24. ----------------------
  25. Install RLlib and run your first experiment on your laptop in seconds:
  26. **TensorFlow:**
  27. .. code-block:: bash
  28. $ conda create -n rllib python=3.8
  29. $ conda activate rllib
  30. $ pip install "ray[rllib]" tensorflow "gym[atari]" "gym[accept-rom-license]" atari_py
  31. $ # Run a test job:
  32. $ rllib train --run APPO --env CartPole-v0
  33. **PyTorch:**
  34. .. code-block:: bash
  35. $ conda create -n rllib python=3.8
  36. $ conda activate rllib
  37. $ pip install "ray[rllib]" torch "gym[atari]" "gym[accept-rom-license]" atari_py
  38. $ # Run a test job:
  39. $ rllib train --run APPO --env CartPole-v0 --torch
  40. Quick First Experiment
  41. ----------------------
  42. .. code-block:: python
  43. import gym
  44. from ray.rllib.agents.ppo import PPOTrainer
  45. # Define your problem using python and openAI's gym API:
  46. class ParrotEnv(gym.Env):
  47. """Environment in which an agent must learn to repeat the seen observations.
  48. Observations are float numbers indicating the to-be-repeated values,
  49. e.g. -1.0, 5.1, or 3.2.
  50. The action space is always the same as the observation space.
  51. Rewards are r=-abs(observation - action), for all steps.
  52. """
  53. def __init__(self, config):
  54. # Make the space (for actions and observations) configurable.
  55. self.action_space = config.get(
  56. "parrot_shriek_range", gym.spaces.Box(-1.0, 1.0, shape=(1, )))
  57. # Since actions should repeat observations, their spaces must be the
  58. # same.
  59. self.observation_space = self.action_space
  60. self.cur_obs = None
  61. self.episode_len = 0
  62. def reset(self):
  63. """Resets the episode and returns the initial observation of the new one.
  64. """
  65. # Reset the episode len.
  66. self.episode_len = 0
  67. # Sample a random number from our observation space.
  68. self.cur_obs = self.observation_space.sample()
  69. # Return initial observation.
  70. return self.cur_obs
  71. def step(self, action):
  72. """Takes a single step in the episode given `action`
  73. Returns:
  74. New observation, reward, done-flag, info-dict (empty).
  75. """
  76. # Set `done` flag after 10 steps.
  77. self.episode_len += 1
  78. done = self.episode_len >= 10
  79. # r = -abs(obs - action)
  80. reward = -sum(abs(self.cur_obs - action))
  81. # Set a new observation (random sample).
  82. self.cur_obs = self.observation_space.sample()
  83. return self.cur_obs, reward, done, {}
  84. # Create an RLlib Trainer instance to learn how to act in the above
  85. # environment.
  86. trainer = PPOTrainer(
  87. config={
  88. # Env class to use (here: our gym.Env sub-class from above).
  89. "env": ParrotEnv,
  90. # Config dict to be passed to our custom env's constructor.
  91. "env_config": {
  92. "parrot_shriek_range": gym.spaces.Box(-5.0, 5.0, (1, ))
  93. },
  94. # Parallelize environment rollouts.
  95. "num_workers": 3,
  96. })
  97. # Train for n iterations and report results (mean episode rewards).
  98. # Since we have to guess 10 times and the optimal reward is 0.0
  99. # (exact match between observation and action value),
  100. # we can expect to reach an optimal episode reward of 0.0.
  101. for i in range(5):
  102. results = trainer.train()
  103. print(f"Iter: {i}; avg. reward={results['episode_reward_mean']}")
  104. After training, you may want to perform action computations (inference) in your environment.
  105. Below is a minimal example on how to do this. Also
  106. `check out our more detailed examples here <https://github.com/ray-project/ray/tree/master/rllib/examples/inference_and_serving>`_
  107. (in particular for `normal models <https://github.com/ray-project/ray/blob/master/rllib/examples/inference_and_serving/policy_inference_after_training.py>`_,
  108. `LSTMs <https://github.com/ray-project/ray/blob/master/rllib/examples/inference_and_serving/policy_inference_after_training_with_lstm.py>`_,
  109. and `attention nets <https://github.com/ray-project/ray/blob/master/rllib/examples/inference_and_serving/policy_inference_after_training_with_attention.py>`_).
  110. .. code-block:: python
  111. # Perform inference (action computations) based on given env observations.
  112. # Note that we are using a slightly simpler env here (-3.0 to 3.0, instead
  113. # of -5.0 to 5.0!), however, this should still work as the agent has
  114. # (hopefully) learned to "just always repeat the observation!".
  115. env = ParrotEnv({"parrot_shriek_range": gym.spaces.Box(-3.0, 3.0, (1, ))})
  116. # Get the initial observation (some value between -10.0 and 10.0).
  117. obs = env.reset()
  118. done = False
  119. total_reward = 0.0
  120. # Play one episode.
  121. while not done:
  122. # Compute a single action, given the current observation
  123. # from the environment.
  124. action = trainer.compute_single_action(obs)
  125. # Apply the computed action in the environment.
  126. obs, reward, done, info = env.step(action)
  127. # Sum up rewards for reporting purposes.
  128. total_reward += reward
  129. # Report results.
  130. print(f"Shreaked for 1 episode; total-reward={total_reward}")
  131. For a more detailed `"60 second" example, head to our main documentation <https://docs.ray.io/en/master/rllib/index.html>`_.
  132. Highlighted Features
  133. --------------------
  134. The following is a summary of RLlib's most striking features (for an in-depth overview,
  135. check out our `documentation <http://docs.ray.io/en/master/rllib/index.html>`_):
  136. The most **popular deep-learning frameworks**: `PyTorch <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_torch_policy.py>`_ and `TensorFlow
  137. (tf1.x/2.x static-graph/eager/traced) <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_tf_policy.py>`_.
  138. **Highly distributed learning**: Our RLlib algorithms (such as our "PPO" or "IMPALA")
  139. allow you to set the ``num_workers`` config parameter, such that your workloads can run
  140. on 100s of CPUs/nodes thus parallelizing and speeding up learning.
  141. **Vectorized (batched) and remote (parallel) environments**: RLlib auto-vectorizes
  142. your ``gym.Envs`` via the ``num_envs_per_worker`` config. Environment workers can
  143. then batch and thus significantly speedup the action computing forward pass.
  144. On top of that, RLlib offers the ``remote_worker_envs`` config to create
  145. `single environments (within a vectorized one) as ray Actors <https://github.com/ray-project/ray/blob/master/rllib/examples/remote_base_env_with_custom_api.py>`_,
  146. thus parallelizing even the env stepping process.
  147. | **Multi-agent RL** (MARL): Convert your (custom) ``gym.Envs`` into a multi-agent one
  148. via a few simple steps and start training your agents in any of the following fashions:
  149. | 1) Cooperative with `shared <https://github.com/ray-project/ray/blob/master/rllib/examples/centralized_critic.py>`_ or
  150. `separate <https://github.com/ray-project/ray/blob/master/rllib/examples/two_step_game.py>`_
  151. policies and/or value functions.
  152. | 2) Adversarial scenarios using `self-play <https://github.com/ray-project/ray/blob/master/rllib/examples/self_play_with_open_spiel.py>`_
  153. and `league-based training <https://github.com/ray-project/ray/blob/master/rllib/examples/self_play_league_based_with_open_spiel.py>`_.
  154. | 3) `Independent learning <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_independent_learning.py>`_
  155. of neutral/co-existing agents.
  156. **External simulators**: Don't have your simulation running as a gym.Env in python?
  157. No problem! RLlib supports an external environment API and comes with a pluggable,
  158. off-the-shelve
  159. `client <https://github.com/ray-project/ray/blob/master/rllib/examples/serving/cartpole_client.py>`_/
  160. `server <https://github.com/ray-project/ray/blob/master/rllib/examples/serving/cartpole_server.py>`_
  161. setup that allows you to run 100s of independent simulators on the "outside"
  162. (e.g. a Windows cloud) connecting to a central RLlib Policy-Server that learns
  163. and serves actions. Alternatively, actions can be computed on the client side
  164. to save on network traffic.
  165. **Offline RL and imitation learning/behavior cloning**: You don't have a simulator
  166. for your particular problem, but tons of historic data recorded by a legacy (maybe
  167. non-RL/ML) system? This branch of reinforcement learning is for you!
  168. RLlib's comes with several `offline RL <https://github.com/ray-project/ray/blob/master/rllib/examples/offline_rl.py>`_
  169. algorithms (*CQL*, *MARWIL*, and *DQfD*), allowing you to either purely
  170. `behavior-clone <https://github.com/ray-project/ray/blob/master/rllib/agents/marwil/tests/test_bc.py>`_
  171. your existing system or learn how to further improve over it.
  172. In-Depth Documentation
  173. ----------------------
  174. For an in-depth overview of RLlib and everything it has to offer, including
  175. hand-on tutorials of important industry use cases and workflows, head over to
  176. our `documentation pages <https://docs.ray.io/en/master/rllib/index.html>`_.
  177. Cite our Paper
  178. --------------
  179. If you've found RLlib useful for your research, please cite our `paper <https://arxiv.org/abs/1712.09381>`_ as follows:
  180. .. code-block::
  181. @inproceedings{liang2018rllib,
  182. Author = {Eric Liang and
  183. Richard Liaw and
  184. Robert Nishihara and
  185. Philipp Moritz and
  186. Roy Fox and
  187. Ken Goldberg and
  188. Joseph E. Gonzalez and
  189. Michael I. Jordan and
  190. Ion Stoica},
  191. Title = {{RLlib}: Abstractions for Distributed Reinforcement Learning},
  192. Booktitle = {International Conference on Machine Learning ({ICML})},
  193. Year = {2018}
  194. }