RLlib Table of Contents ======================= Training APIs ------------- * `Command-line `__ - `Evaluating Trained Policies `__ * `Configuration `__ - `Specifying Parameters `__ - `Specifying Resources `__ - `Common Parameters `__ - `Scaling Guide `__ - `Tuned Examples `__ * `Basic Python API `__ - `Computing Actions `__ - `Accessing Policy State `__ - `Accessing Model State `__ * `Advanced Python APIs `__ - `Custom Training Workflows `__ - `Global Coordination `__ - `Callbacks and Custom Metrics `__ - `Customizing Exploration Behavior `__ - `Customized Evaluation During Training `__ - `Rewriting Trajectories `__ - `Curriculum Learning `__ * `Debugging `__ - `Gym Monitor `__ - `Eager Mode `__ - `Episode Traces `__ - `Log Verbosity `__ - `Stack Traces `__ * `External Application API `__ Environments ------------ * `RLlib Environments Overview `__ * `OpenAI Gym `__ * `Vectorized `__ * `Multi-Agent and Hierarchical `__ * `External Agents and Applications `__ - `External Application Clients `__ * `Advanced Integrations `__ Models, Preprocessors, and Action Distributions ----------------------------------------------- * `RLlib Models, Preprocessors, and Action Distributions Overview `__ * `TensorFlow Models `__ * `PyTorch Models `__ * `Custom Preprocessors `__ * `Custom Action Distributions `__ * `Supervised Model Losses `__ * `Self-Supervised Model Losses `__ * `Variable-length / Complex Observation Spaces `__ * `Variable-length / Parametric Action Spaces `__ * `Autoregressive Action Distributions `__ Algorithms ---------- * High-throughput architectures - |pytorch| |tensorflow| :ref:`Distributed Prioritized Experience Replay (Ape-X) ` - |pytorch| |tensorflow| :ref:`Importance Weighted Actor-Learner Architecture (IMPALA) ` - |pytorch| |tensorflow| :ref:`Asynchronous Proximal Policy Optimization (APPO) ` - |pytorch| :ref:`Decentralized Distributed Proximal Policy Optimization (DD-PPO) ` * Gradient-based - |pytorch| |tensorflow| :ref:`Advantage Actor-Critic (A2C, A3C) ` - |pytorch| |tensorflow| :ref:`Deep Deterministic Policy Gradients (DDPG, TD3) ` - |pytorch| |tensorflow| :ref:`Deep Q Networks (DQN, Rainbow, Parametric DQN) ` - |pytorch| |tensorflow| :ref:`Policy Gradients ` - |pytorch| |tensorflow| :ref:`Proximal Policy Optimization (PPO) ` - |pytorch| |tensorflow| :ref:`Soft Actor Critic (SAC) ` - |pytorch| :ref:`Slate Q-Learning (SlateQ) ` * Derivative-free - |pytorch| |tensorflow| :ref:`Augmented Random Search (ARS) ` - |pytorch| |tensorflow| :ref:`Evolution Strategies ` * Model-based / Meta-learning / Offline - |pytorch| :ref:`Single-Player AlphaZero (contrib/AlphaZero) ` - |pytorch| |tensorflow| :ref:`Model-Agnostic Meta-Learning (MAML) ` - |pytorch| :ref:`Model-Based Meta-Policy-Optimization (MBMPO) ` - |pytorch| :ref:`Dreamer (DREAMER) ` - |pytorch| :ref:`Conservative Q-Learning (CQL) ` * Multi-agent - |pytorch| :ref:`QMIX Monotonic Value Factorisation (QMIX, VDN, IQN) ` - |tensorflow| :ref:`Multi-Agent Deep Deterministic Policy Gradient (contrib/MADDPG) ` * Offline - |pytorch| |tensorflow| :ref:`Advantage Re-Weighted Imitation Learning (MARWIL) ` * Contextual bandits - |pytorch| :ref:`Linear Upper Confidence Bound (contrib/LinUCB) ` - |pytorch| :ref:`Linear Thompson Sampling (contrib/LinTS) ` * Exploration-based plug-ins (can be combined with any algo) - |pytorch| :ref:`Curiosity (ICM: Intrinsic Curiosity Module) ` Sample Collection ----------------- * `The SampleCollector Class is Used to Store and Retrieve Temporary Data `__ * `Trajectory View API `__ Offline Datasets ---------------- * `Working with Offline Datasets `__ * `Input Pipeline for Supervised Losses `__ * `Input API `__ * `Output API `__ Concepts and Custom Algorithms ------------------------------ * `Policies `__ - `Policies in Multi-Agent `__ - `Building Policies in TensorFlow `__ - `Building Policies in TensorFlow Eager `__ - `Building Policies in PyTorch `__ - `Extending Existing Policies `__ * `Policy Evaluation `__ * `Execution Plans `__ * `Trainers `__ Examples -------- * `Tuned Examples `__ * `Training Workflows `__ * `Custom Envs and Models `__ * `Serving and Offline `__ * `Multi-Agent and Hierarchical `__ * `Community Examples `__ Development ----------- * `Development Install `__ * `API Stability `__ * `Features `__ * `Benchmarks `__ * `Contributing Algorithms `__ Package Reference ----------------- * `ray.rllib.agents `__ * `ray.rllib.env `__ * `ray.rllib.evaluation `__ * `ray.rllib.execution `__ * `ray.rllib.models `__ * `ray.rllib.utils `__ Troubleshooting --------------- If you encounter errors like `blas_thread_init: pthread_create: Resource temporarily unavailable` when using many workers, try setting ``OMP_NUM_THREADS=1``. Similarly, check configured system limits with `ulimit -a` for other resource limit errors. For debugging unexpected hangs or performance problems, you can run ``ray stack`` to dump the stack traces of all Ray workers on the current node, ``ray timeline`` to dump a timeline visualization of tasks to a file, and ``ray memory`` to list all object references in the cluster. TensorFlow 2.0 ~~~~~~~~~~~~~~ RLlib currently runs in ``tf.compat.v1`` mode. This means eager execution is disabled by default, and RLlib imports TF with ``import tensorflow.compat.v1 as tf; tf.disable_v2_behaviour()``. Eager execution can be enabled manually by calling ``tf.enable_eager_execution()`` or setting the ``"eager": True`` trainer config. .. |tensorflow| image:: tensorflow.png :class: inline-figure :width: 16 .. |pytorch| image:: pytorch.png :class: inline-figure :width: 16