rllib-toc.rst 9.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233
  1. RLlib Table of Contents
  2. =======================
  3. Training APIs
  4. -------------
  5. * `Command-line <rllib-training.html>`__
  6. - `Evaluating Trained Policies <rllib-training.html#evaluating-trained-policies>`__
  7. * `Configuration <rllib-training.html#configuration>`__
  8. - `Specifying Parameters <rllib-training.html#specifying-parameters>`__
  9. - `Specifying Resources <rllib-training.html#specifying-resources>`__
  10. - `Common Parameters <rllib-training.html#common-parameters>`__
  11. - `Scaling Guide <rllib-training.html#scaling-guide>`__
  12. - `Tuned Examples <rllib-training.html#tuned-examples>`__
  13. * `Basic Python API <rllib-training.html#basic-python-api>`__
  14. - `Computing Actions <rllib-training.html#computing-actions>`__
  15. - `Accessing Policy State <rllib-training.html#accessing-policy-state>`__
  16. - `Accessing Model State <rllib-training.html#accessing-model-state>`__
  17. * `Advanced Python APIs <rllib-training.html#advanced-python-apis>`__
  18. - `Custom Training Workflows <rllib-training.html#custom-training-workflows>`__
  19. - `Global Coordination <rllib-training.html#global-coordination>`__
  20. - `Callbacks and Custom Metrics <rllib-training.html#callbacks-and-custom-metrics>`__
  21. - `Customizing Exploration Behavior <rllib-training.html#customizing-exploration-behavior>`__
  22. - `Customized Evaluation During Training <rllib-training.html#customized-evaluation-during-training>`__
  23. - `Rewriting Trajectories <rllib-training.html#rewriting-trajectories>`__
  24. - `Curriculum Learning <rllib-training.html#curriculum-learning>`__
  25. * `Debugging <rllib-training.html#debugging>`__
  26. - `Gym Monitor <rllib-training.html#gym-monitor>`__
  27. - `Eager Mode <rllib-training.html#eager-mode>`__
  28. - `Episode Traces <rllib-training.html#episode-traces>`__
  29. - `Log Verbosity <rllib-training.html#log-verbosity>`__
  30. - `Stack Traces <rllib-training.html#stack-traces>`__
  31. * `External Application API <rllib-training.html#external-application-api>`__
  32. Environments
  33. ------------
  34. * `RLlib Environments Overview <rllib-env.html>`__
  35. * `OpenAI Gym <rllib-env.html#openai-gym>`__
  36. * `Vectorized <rllib-env.html#vectorized>`__
  37. * `Multi-Agent and Hierarchical <rllib-env.html#multi-agent-and-hierarchical>`__
  38. * `External Agents and Applications <rllib-env.html#external-agents-and-applications>`__
  39. - `External Application Clients <rllib-env.html#external-application-clients>`__
  40. * `Advanced Integrations <rllib-env.html#advanced-integrations>`__
  41. Models, Preprocessors, and Action Distributions
  42. -----------------------------------------------
  43. * `RLlib Models, Preprocessors, and Action Distributions Overview <rllib-models.html>`__
  44. * `TensorFlow Models <rllib-models.html#tensorflow-models>`__
  45. * `PyTorch Models <rllib-models.html#pytorch-models>`__
  46. * `Custom Preprocessors <rllib-models.html#custom-preprocessors>`__
  47. * `Custom Action Distributions <rllib-models.html#custom-action-distributions>`__
  48. * `Supervised Model Losses <rllib-models.html#supervised-model-losses>`__
  49. * `Self-Supervised Model Losses <rllib-models.html#self-supervised-model-losses>`__
  50. * `Variable-length / Complex Observation Spaces <rllib-models.html#variable-length-complex-observation-spaces>`__
  51. * `Variable-length / Parametric Action Spaces <rllib-models.html#variable-length-parametric-action-spaces>`__
  52. * `Autoregressive Action Distributions <rllib-models.html#autoregressive-action-distributions>`__
  53. Algorithms
  54. ----------
  55. * High-throughput architectures
  56. - |pytorch| |tensorflow| :ref:`Distributed Prioritized Experience Replay (Ape-X) <apex>`
  57. - |pytorch| |tensorflow| :ref:`Importance Weighted Actor-Learner Architecture (IMPALA) <impala>`
  58. - |pytorch| |tensorflow| :ref:`Asynchronous Proximal Policy Optimization (APPO) <appo>`
  59. - |pytorch| :ref:`Decentralized Distributed Proximal Policy Optimization (DD-PPO) <ddppo>`
  60. * Gradient-based
  61. - |pytorch| |tensorflow| :ref:`Advantage Actor-Critic (A2C, A3C) <a3c>`
  62. - |pytorch| |tensorflow| :ref:`Deep Deterministic Policy Gradients (DDPG, TD3) <ddpg>`
  63. - |pytorch| |tensorflow| :ref:`Deep Q Networks (DQN, Rainbow, Parametric DQN) <dqn>`
  64. - |pytorch| |tensorflow| :ref:`Policy Gradients <pg>`
  65. - |pytorch| |tensorflow| :ref:`Proximal Policy Optimization (PPO) <ppo>`
  66. - |pytorch| |tensorflow| :ref:`Soft Actor Critic (SAC) <sac>`
  67. - |pytorch| :ref:`Slate Q-Learning (SlateQ) <slateq>`
  68. * Derivative-free
  69. - |pytorch| |tensorflow| :ref:`Augmented Random Search (ARS) <ars>`
  70. - |pytorch| |tensorflow| :ref:`Evolution Strategies <es>`
  71. * Model-based / Meta-learning / Offline
  72. - |pytorch| :ref:`Single-Player AlphaZero (contrib/AlphaZero) <alphazero>`
  73. - |pytorch| |tensorflow| :ref:`Model-Agnostic Meta-Learning (MAML) <maml>`
  74. - |pytorch| :ref:`Model-Based Meta-Policy-Optimization (MBMPO) <mbmpo>`
  75. - |pytorch| :ref:`Dreamer (DREAMER) <dreamer>`
  76. - |pytorch| :ref:`Conservative Q-Learning (CQL) <cql>`
  77. * Multi-agent
  78. - |pytorch| :ref:`QMIX Monotonic Value Factorisation (QMIX, VDN, IQN) <qmix>`
  79. - |tensorflow| :ref:`Multi-Agent Deep Deterministic Policy Gradient (contrib/MADDPG) <maddpg>`
  80. * Offline
  81. - |pytorch| |tensorflow| :ref:`Advantage Re-Weighted Imitation Learning (MARWIL) <marwil>`
  82. * Contextual bandits
  83. - |pytorch| :ref:`Linear Upper Confidence Bound (contrib/LinUCB) <linucb>`
  84. - |pytorch| :ref:`Linear Thompson Sampling (contrib/LinTS) <lints>`
  85. * Exploration-based plug-ins (can be combined with any algo)
  86. - |pytorch| :ref:`Curiosity (ICM: Intrinsic Curiosity Module) <curiosity>`
  87. Sample Collection
  88. -----------------
  89. * `The SampleCollector Class is Used to Store and Retrieve Temporary Data <rllib-sample-collection.html#the-samplecollector-class-is-used-to-store-and-retrieve-temporary-data>`__
  90. * `Trajectory View API <rllib-sample-collection.html#trajectory-view-api>`__
  91. Offline Datasets
  92. ----------------
  93. * `Working with Offline Datasets <rllib-offline.html>`__
  94. * `Input Pipeline for Supervised Losses <rllib-offline.html#input-pipeline-for-supervised-losses>`__
  95. * `Input API <rllib-offline.html#input-api>`__
  96. * `Output API <rllib-offline.html#output-api>`__
  97. Concepts and Custom Algorithms
  98. ------------------------------
  99. * `Policies <rllib-concepts.html>`__
  100. - `Policies in Multi-Agent <rllib-concepts.html#policies-in-multi-agent>`__
  101. - `Building Policies in TensorFlow <rllib-concepts.html#building-policies-in-tensorflow>`__
  102. - `Building Policies in TensorFlow Eager <rllib-concepts.html#building-policies-in-tensorflow-eager>`__
  103. - `Building Policies in PyTorch <rllib-concepts.html#building-policies-in-pytorch>`__
  104. - `Extending Existing Policies <rllib-concepts.html#extending-existing-policies>`__
  105. * `Policy Evaluation <rllib-concepts.html#policy-evaluation>`__
  106. * `Execution Plans <rllib-concepts.html#execution-plans>`__
  107. * `Trainers <rllib-concepts.html#trainers>`__
  108. Examples
  109. --------
  110. * `Tuned Examples <rllib-examples.html#tuned-examples>`__
  111. * `Training Workflows <rllib-examples.html#training-workflows>`__
  112. * `Custom Envs and Models <rllib-examples.html#custom-envs-and-models>`__
  113. * `Serving and Offline <rllib-examples.html#serving-and-offline>`__
  114. * `Multi-Agent and Hierarchical <rllib-examples.html#multi-agent-and-hierarchical>`__
  115. * `Community Examples <rllib-examples.html#community-examples>`__
  116. Development
  117. -----------
  118. * `Development Install <rllib-dev.html#development-install>`__
  119. * `API Stability <rllib-dev.html#api-stability>`__
  120. * `Features <rllib-dev.html#feature-development>`__
  121. * `Benchmarks <rllib-dev.html#benchmarks>`__
  122. * `Contributing Algorithms <rllib-dev.html#contributing-algorithms>`__
  123. Package Reference
  124. -----------------
  125. * `ray.rllib.agents <rllib-package-ref.html#module-ray.rllib.agents>`__
  126. * `ray.rllib.env <rllib-package-ref.html#module-ray.rllib.env>`__
  127. * `ray.rllib.evaluation <rllib-package-ref.html#module-ray.rllib.evaluation>`__
  128. * `ray.rllib.execution <rllib-package-ref.html#module-ray.rllib.execution>`__
  129. * `ray.rllib.models <rllib-package-ref.html#module-ray.rllib.models>`__
  130. * `ray.rllib.utils <rllib-package-ref.html#module-ray.rllib.utils>`__
  131. Troubleshooting
  132. ---------------
  133. If you encounter errors like
  134. `blas_thread_init: pthread_create: Resource temporarily unavailable` when using many workers,
  135. try setting ``OMP_NUM_THREADS=1``. Similarly, check configured system limits with
  136. `ulimit -a` for other resource limit errors.
  137. For debugging unexpected hangs or performance problems, you can run ``ray stack`` to dump
  138. the stack traces of all Ray workers on the current node, ``ray timeline`` to dump
  139. a timeline visualization of tasks to a file, and ``ray memory`` to list all object
  140. references in the cluster.
  141. TensorFlow 2.0
  142. ~~~~~~~~~~~~~~
  143. RLlib currently runs in ``tf.compat.v1`` mode. This means eager execution is disabled by default, and RLlib imports TF with ``import tensorflow.compat.v1 as tf; tf.disable_v2_behaviour()``. Eager execution can be enabled manually by calling ``tf.enable_eager_execution()`` or setting the ``"eager": True`` trainer config.
  144. .. |tensorflow| image:: tensorflow.png
  145. :class: inline-figure
  146. :width: 16
  147. .. |pytorch| image:: pytorch.png
  148. :class: inline-figure
  149. :width: 16