openoker
/
ray


			
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343
							.. _evaluation-reference-docs:

Sampling the Environment or offline data
========================================

Data ingest via either environment rollouts or other data-generating methods
(e.g. reading from offline files) is done in RLlib by :py:class:`~ray.rllib.evaluation.rollout_worker.RolloutWorker` instances,
which sit inside a :py:class:`~ray.rllib.evaluation.worker_set.WorkerSet`
(together with other parallel ``RolloutWorkers``) in the RLlib :py:class:`~ray.rllib.algorithms.algorithm.Algorithm`
(under the ``self.workers`` property):


.. https://docs.google.com/drawings/d/1OewMLAu6KZNon7zpDfZnTh9qiT6m-3M9wnkqWkQQMRc/edit
.. figure:: ../images/rollout_worker_class_overview.svg
    :width: 600
    :align: left

    **A typical RLlib WorkerSet setup inside an RLlib Algorithm:** Each :py:class:`~ray.rllib.evaluation.worker_set.WorkerSet` contains
    exactly one local :py:class:`~ray.rllib.evaluation.rollout_worker.RolloutWorker` object and N ray remote
    :py:class:`~ray.rllib.evaluation.rollout_worker.RolloutWorker` (ray actors).
    The workers contain a policy map (with one or more policies), and - in case a simulator
    (env) is available - a vectorized :py:class:`~ray.rllib.env.base_env.BaseEnv`
    (containing M sub-environments) and a :py:class:`~ray.rllib.evaluation.sampler.SamplerInput` (either synchronous or asynchronous) which controls
    the environment data collection loop.
    In the online case (i.e. environment is available) as well as the offline case (i.e. no environment),
    :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` uses the :py:meth:`~ray.rllib.evaluation.rollout_worker.RolloutWorker.sample` method to
    get :py:class:`~ray.rllib.policy.sample_batch.SampleBatch` objects for training.


.. _rolloutworker-reference-docs:

RolloutWorker API
-----------------

.. currentmodule:: ray.rllib.evaluation.rollout_worker

Constructor
~~~~~~~~~~~


.. autosummary::
   :toctree: doc/

   RolloutWorker

Multi agent
~~~~~~~~~~~

.. autosummary::
   :toctree: doc/

    ~RolloutWorker.add_policy
    ~RolloutWorker.remove_policy
    ~RolloutWorker.get_policy
    ~RolloutWorker.set_is_policy_to_train
    ~RolloutWorker.set_policy_mapping_fn
    ~RolloutWorker.for_policy
    ~RolloutWorker.foreach_policy
    ~RolloutWorker.foreach_policy_to_train

Setter and getter methods
~~~~~~~~~~~~~~~~~~~~~~~~~

.. autosummary::
   :toctree: doc/

    ~RolloutWorker.get_filters
    ~RolloutWorker.get_global_vars
    ~RolloutWorker.set_global_vars
    ~RolloutWorker.get_host
    ~RolloutWorker.get_metrics
    ~RolloutWorker.get_node_ip
    ~RolloutWorker.get_weights
    ~RolloutWorker.set_weights
    ~RolloutWorker.get_state
    ~RolloutWorker.set_state

Threading
~~~~~~~~~
.. autosummary::
   :toctree: doc/

    ~RolloutWorker.lock
    ~RolloutWorker.unlock

Sampling API
~~~~~~~~~~~~

.. autosummary::
   :toctree: doc/

    ~RolloutWorker.sample
    ~RolloutWorker.sample_with_count
    ~RolloutWorker.sample_and_learn

Training API
~~~~~~~~~~~~

.. autosummary::
    :toctree: doc/

    ~RolloutWorker.learn_on_batch
    ~RolloutWorker.setup_torch_data_parallel
    ~RolloutWorker.compute_gradients
    ~RolloutWorker.apply_gradients

Environment API
~~~~~~~~~~~~~~~

.. autosummary::
    :toctree: doc/

    ~RolloutWorker.foreach_env
    ~RolloutWorker.foreach_env_with_context


Miscellaneous
~~~~~~~~~~~~~

.. autosummary::
    :toctree: doc/

    ~RolloutWorker.stop
    ~RolloutWorker.apply
    ~RolloutWorker.sync_filters
    ~RolloutWorker.find_free_port
    ~RolloutWorker.creation_args
    ~RolloutWorker.assert_healthy


.. _workerset-reference-docs:

WorkerSet API
-------------

.. currentmodule:: ray.rllib.evaluation.worker_set

Constructor
~~~~~~~~~~~


.. autosummary::
    :toctree: doc/

    WorkerSet
    WorkerSet.stop
    WorkerSet.reset


Worker Orchestration
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
    :toctree: doc/

    ~WorkerSet.add_workers
    ~WorkerSet.foreach_worker
    ~WorkerSet.foreach_worker_with_id
    ~WorkerSet.foreach_worker_async
    ~WorkerSet.fetch_ready_async_reqs
    ~WorkerSet.num_in_flight_async_reqs
    ~WorkerSet.local_worker
    ~WorkerSet.remote_workers
    ~WorkerSet.num_healthy_remote_workers
    ~WorkerSet.num_healthy_workers
    ~WorkerSet.num_remote_worker_restarts
    ~WorkerSet.probe_unhealthy_workers

Pass-through methods
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
    :toctree: doc/

    ~WorkerSet.add_policy
    ~WorkerSet.foreach_env
    ~WorkerSet.foreach_env_with_context
    ~WorkerSet.foreach_policy
    ~WorkerSet.foreach_policy_to_train
    ~WorkerSet.sync_weights


Sampler API
-----------
:py:class:`~ray.rllib.offline.input_reader.InputReader` instances are used to collect and return experiences from the envs.
For more details on `InputReader` used for offline RL (e.g. reading files of
pre-recorded data), see the :ref:`offline RL API reference here <offline-reference-docs>`.


Input Reader API
~~~~~~~~~~~~~~~~~~~~

.. currentmodule:: ray.rllib.offline.input_reader

.. autosummary::
    :toctree: doc/

    InputReader
    InputReader.next


Input Sampler API
~~~~~~~~~~~~~~~~~~~~

.. currentmodule:: ray.rllib.evaluation.sampler

.. autosummary::
    :toctree: doc/

    SamplerInput
    SamplerInput.get_data
    SamplerInput.get_extra_batches
    SamplerInput.get_metrics

Synchronous Sampler API
~~~~~~~~~~~~~~~~~~~~~~~

.. currentmodule:: ray.rllib.evaluation.sampler

.. autosummary::
    :toctree: doc/

    SyncSampler


Asynchronous Sampler API
~~~~~~~~~~~~~~~~~~~~~~~~

.. currentmodule:: ray.rllib.evaluation.sampler

.. autosummary::
    :toctree: doc/

    AsyncSampler


.. _offline-reference-docs:

Offline Sampler API
~~~~~~~~~~~~~~~~~~~~~~~

The InputReader API is used by an individual :py:class:`~ray.rllib.evaluation.rollout_worker.RolloutWorker`
to produce batches of experiences either from an simulator or from an
offline source (e.g. a file).

Here are some example extentions of the InputReader API:

JSON reader API
++++++++++++++++

.. currentmodule:: ray.rllib.offline.json_reader

.. autosummary::
    :toctree: doc/

    JsonReader
    JsonReader.read_all_files

.. currentmodule:: ray.rllib.offline.mixed_input

Mixed input reader
++++++++++++++++++
.. autosummary::
    :toctree: doc/

    MixedInput

.. currentmodule:: ray.rllib.offline.d4rl_reader

D4RL reader
+++++++++++
.. autosummary::
    :toctree: doc/

    D4RLReader

.. currentmodule:: ray.rllib.offline.io_context

IOContext
~~~~~~~~~
.. autosummary::
    :toctree: doc/

    IOContext
    IOContext.default_sampler_input


Policy Map API
--------------

.. currentmodule:: ray.rllib.policy.policy_map

.. autosummary::
    :toctree: doc/

    PolicyMap
    PolicyMap.items
    PolicyMap.keys
    PolicyMap.values

Sample batch API
----------------

.. currentmodule:: ray.rllib.policy.sample_batch

.. autosummary::
    :toctree: doc/

    SampleBatch
    SampleBatch.set_get_interceptor
    SampleBatch.is_training
    SampleBatch.set_training
    SampleBatch.as_multi_agent
    SampleBatch.get
    SampleBatch.to_device
    SampleBatch.right_zero_pad
    SampleBatch.slice
    SampleBatch.split_by_episode
    SampleBatch.shuffle
    SampleBatch.columns
    SampleBatch.rows
    SampleBatch.copy
    SampleBatch.is_single_trajectory
    SampleBatch.is_terminated_or_truncated
    SampleBatch.env_steps
    SampleBatch.agent_steps


MultiAgent batch API
--------------------

.. currentmodule:: ray.rllib.policy.sample_batch

.. autosummary::
    :toctree: doc/

    MultiAgentBatch
    MultiAgentBatch.env_steps
    MultiAgentBatch.agent_steps