openoker
/
ray


			
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233
							RLlib Table of Contents
=======================

Training APIs
-------------
*  `Command-line <rllib-training.html>`__

   -  `Evaluating Trained Policies <rllib-training.html#evaluating-trained-policies>`__

*  `Configuration <rllib-training.html#configuration>`__

   -  `Specifying Parameters <rllib-training.html#specifying-parameters>`__

   -  `Specifying Resources <rllib-training.html#specifying-resources>`__

   -  `Common Parameters <rllib-training.html#common-parameters>`__

   -  `Scaling Guide <rllib-training.html#scaling-guide>`__

   -  `Tuned Examples <rllib-training.html#tuned-examples>`__

*  `Basic Python API <rllib-training.html#basic-python-api>`__

   -  `Computing Actions <rllib-training.html#computing-actions>`__

   -  `Accessing Policy State <rllib-training.html#accessing-policy-state>`__

   -  `Accessing Model State <rllib-training.html#accessing-model-state>`__

*  `Advanced Python APIs <rllib-training.html#advanced-python-apis>`__

   -  `Custom Training Workflows <rllib-training.html#custom-training-workflows>`__

   -  `Global Coordination <rllib-training.html#global-coordination>`__

   -  `Callbacks and Custom Metrics <rllib-training.html#callbacks-and-custom-metrics>`__

   -  `Customizing Exploration Behavior <rllib-training.html#customizing-exploration-behavior>`__

   -  `Customized Evaluation During Training <rllib-training.html#customized-evaluation-during-training>`__

   -  `Rewriting Trajectories <rllib-training.html#rewriting-trajectories>`__

   -  `Curriculum Learning <rllib-training.html#curriculum-learning>`__

*  `Debugging <rllib-training.html#debugging>`__

   -  `Gym Monitor <rllib-training.html#gym-monitor>`__

   -  `Eager Mode <rllib-training.html#eager-mode>`__

   -  `Episode Traces <rllib-training.html#episode-traces>`__

   -  `Log Verbosity <rllib-training.html#log-verbosity>`__

   -  `Stack Traces <rllib-training.html#stack-traces>`__

*  `External Application API <rllib-training.html#external-application-api>`__

Environments
------------
*  `RLlib Environments Overview <rllib-env.html>`__
*  `OpenAI Gym <rllib-env.html#openai-gym>`__
*  `Vectorized <rllib-env.html#vectorized>`__
*  `Multi-Agent and Hierarchical <rllib-env.html#multi-agent-and-hierarchical>`__
*  `External Agents and Applications <rllib-env.html#external-agents-and-applications>`__

   -  `External Application Clients <rllib-env.html#external-application-clients>`__

*  `Advanced Integrations <rllib-env.html#advanced-integrations>`__

Models, Preprocessors, and Action Distributions
-----------------------------------------------
*  `RLlib Models, Preprocessors, and Action Distributions Overview <rllib-models.html>`__
*  `TensorFlow Models <rllib-models.html#tensorflow-models>`__
*  `PyTorch Models <rllib-models.html#pytorch-models>`__
*  `Custom Preprocessors <rllib-models.html#custom-preprocessors>`__
*  `Custom Action Distributions <rllib-models.html#custom-action-distributions>`__
*  `Supervised Model Losses <rllib-models.html#supervised-model-losses>`__
*  `Self-Supervised Model Losses <rllib-models.html#self-supervised-model-losses>`__
*  `Variable-length / Complex Observation Spaces <rllib-models.html#variable-length-complex-observation-spaces>`__
*  `Variable-length / Parametric Action Spaces <rllib-models.html#variable-length-parametric-action-spaces>`__
*  `Autoregressive Action Distributions <rllib-models.html#autoregressive-action-distributions>`__

Algorithms
----------

*  High-throughput architectures

   -  |pytorch| |tensorflow| :ref:`Distributed Prioritized Experience Replay (Ape-X) <apex>`

   -  |pytorch| |tensorflow| :ref:`Importance Weighted Actor-Learner Architecture (IMPALA) <impala>`

   -  |pytorch| |tensorflow| :ref:`Asynchronous Proximal Policy Optimization (APPO) <appo>`

   -  |pytorch| :ref:`Decentralized Distributed Proximal Policy Optimization (DD-PPO) <ddppo>`

*  Gradient-based

   -  |pytorch| |tensorflow| :ref:`Advantage Actor-Critic (A2C, A3C) <a3c>`

   -  |pytorch| |tensorflow| :ref:`Deep Deterministic Policy Gradients (DDPG, TD3) <ddpg>`

   -  |pytorch| |tensorflow| :ref:`Deep Q Networks (DQN, Rainbow, Parametric DQN) <dqn>`

   -  |pytorch| |tensorflow| :ref:`Policy Gradients <pg>`

   -  |pytorch| |tensorflow| :ref:`Proximal Policy Optimization (PPO) <ppo>`

   -  |pytorch| |tensorflow| :ref:`Soft Actor Critic (SAC) <sac>`

   -  |pytorch| :ref:`Slate Q-Learning (SlateQ) <slateq>`

*  Derivative-free

   -  |pytorch| |tensorflow| :ref:`Augmented Random Search (ARS) <ars>`

   -  |pytorch| |tensorflow| :ref:`Evolution Strategies <es>`

*  Model-based / Meta-learning / Offline

   -  |pytorch| :ref:`Single-Player AlphaZero (contrib/AlphaZero) <alphazero>`

   -  |pytorch| |tensorflow| :ref:`Model-Agnostic Meta-Learning (MAML) <maml>`

   -  |pytorch| :ref:`Model-Based Meta-Policy-Optimization (MBMPO) <mbmpo>`

   -  |pytorch| :ref:`Dreamer (DREAMER) <dreamer>`

   -  |pytorch| :ref:`Conservative Q-Learning (CQL) <cql>`

*  Multi-agent

   -  |pytorch| :ref:`QMIX Monotonic Value Factorisation (QMIX, VDN, IQN) <qmix>`
   -  |tensorflow| :ref:`Multi-Agent Deep Deterministic Policy Gradient (contrib/MADDPG) <maddpg>`

*  Offline

   -  |pytorch| |tensorflow| :ref:`Advantage Re-Weighted Imitation Learning (MARWIL) <marwil>`

*  Contextual bandits

   -  |pytorch| :ref:`Linear Upper Confidence Bound (contrib/LinUCB) <linucb>`
   -  |pytorch| :ref:`Linear Thompson Sampling (contrib/LinTS) <lints>`

*  Exploration-based plug-ins (can be combined with any algo)

   -  |pytorch| :ref:`Curiosity (ICM: Intrinsic Curiosity Module) <curiosity>`

Sample Collection
-----------------
*  `The SampleCollector Class is Used to Store and Retrieve Temporary Data <rllib-sample-collection.html#the-samplecollector-class-is-used-to-store-and-retrieve-temporary-data>`__
*  `Trajectory View API <rllib-sample-collection.html#trajectory-view-api>`__


Offline Datasets
----------------
*  `Working with Offline Datasets <rllib-offline.html>`__
*  `Input Pipeline for Supervised Losses <rllib-offline.html#input-pipeline-for-supervised-losses>`__
*  `Input API <rllib-offline.html#input-api>`__
*  `Output API <rllib-offline.html#output-api>`__

Concepts and Custom Algorithms
------------------------------
*  `Policies <rllib-concepts.html>`__

   -  `Policies in Multi-Agent <rllib-concepts.html#policies-in-multi-agent>`__

   -  `Building Policies in TensorFlow <rllib-concepts.html#building-policies-in-tensorflow>`__

   -  `Building Policies in TensorFlow Eager <rllib-concepts.html#building-policies-in-tensorflow-eager>`__

   -  `Building Policies in PyTorch <rllib-concepts.html#building-policies-in-pytorch>`__

   -  `Extending Existing Policies <rllib-concepts.html#extending-existing-policies>`__

*  `Policy Evaluation <rllib-concepts.html#policy-evaluation>`__
*  `Execution Plans <rllib-concepts.html#execution-plans>`__
*  `Trainers <rllib-concepts.html#trainers>`__

Examples
--------

*  `Tuned Examples <rllib-examples.html#tuned-examples>`__
*  `Training Workflows <rllib-examples.html#training-workflows>`__
*  `Custom Envs and Models <rllib-examples.html#custom-envs-and-models>`__
*  `Serving and Offline <rllib-examples.html#serving-and-offline>`__
*  `Multi-Agent and Hierarchical <rllib-examples.html#multi-agent-and-hierarchical>`__
*  `Community Examples <rllib-examples.html#community-examples>`__

Development
-----------

*  `Development Install <rllib-dev.html#development-install>`__
*  `API Stability <rllib-dev.html#api-stability>`__
*  `Features <rllib-dev.html#feature-development>`__
*  `Benchmarks <rllib-dev.html#benchmarks>`__
*  `Contributing Algorithms <rllib-dev.html#contributing-algorithms>`__

Package Reference
-----------------
*  `ray.rllib.agents <rllib-package-ref.html#module-ray.rllib.agents>`__
*  `ray.rllib.env <rllib-package-ref.html#module-ray.rllib.env>`__
*  `ray.rllib.evaluation <rllib-package-ref.html#module-ray.rllib.evaluation>`__
*  `ray.rllib.execution <rllib-package-ref.html#module-ray.rllib.execution>`__
*  `ray.rllib.models <rllib-package-ref.html#module-ray.rllib.models>`__
*  `ray.rllib.utils <rllib-package-ref.html#module-ray.rllib.utils>`__

Troubleshooting
---------------

If you encounter errors like
`blas_thread_init: pthread_create: Resource temporarily unavailable` when using many workers,
try setting ``OMP_NUM_THREADS=1``. Similarly, check configured system limits with
`ulimit -a` for other resource limit errors.

For debugging unexpected hangs or performance problems, you can run ``ray stack`` to dump
the stack traces of all Ray workers on the current node, ``ray timeline`` to dump
a timeline visualization of tasks to a file, and ``ray memory`` to list all object
references in the cluster.

TensorFlow 2.0
~~~~~~~~~~~~~~

RLlib currently runs in ``tf.compat.v1`` mode. This means eager execution is disabled by default, and RLlib imports TF with ``import tensorflow.compat.v1 as tf; tf.disable_v2_behaviour()``. Eager execution can be enabled manually by calling ``tf.enable_eager_execution()`` or setting the ``"eager": True`` trainer config.

.. |tensorflow| image:: tensorflow.png
    :class: inline-figure
    :width: 16

.. |pytorch| image:: pytorch.png
    :class: inline-figure
    :width: 16