evaluation.rst 7.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343
  1. .. _evaluation-reference-docs:
  2. Sampling the Environment or offline data
  3. ========================================
  4. Data ingest via either environment rollouts or other data-generating methods
  5. (e.g. reading from offline files) is done in RLlib by :py:class:`~ray.rllib.evaluation.rollout_worker.RolloutWorker` instances,
  6. which sit inside a :py:class:`~ray.rllib.evaluation.worker_set.WorkerSet`
  7. (together with other parallel ``RolloutWorkers``) in the RLlib :py:class:`~ray.rllib.algorithms.algorithm.Algorithm`
  8. (under the ``self.workers`` property):
  9. .. https://docs.google.com/drawings/d/1OewMLAu6KZNon7zpDfZnTh9qiT6m-3M9wnkqWkQQMRc/edit
  10. .. figure:: ../images/rollout_worker_class_overview.svg
  11. :width: 600
  12. :align: left
  13. **A typical RLlib WorkerSet setup inside an RLlib Algorithm:** Each :py:class:`~ray.rllib.evaluation.worker_set.WorkerSet` contains
  14. exactly one local :py:class:`~ray.rllib.evaluation.rollout_worker.RolloutWorker` object and N ray remote
  15. :py:class:`~ray.rllib.evaluation.rollout_worker.RolloutWorker` (ray actors).
  16. The workers contain a policy map (with one or more policies), and - in case a simulator
  17. (env) is available - a vectorized :py:class:`~ray.rllib.env.base_env.BaseEnv`
  18. (containing M sub-environments) and a :py:class:`~ray.rllib.evaluation.sampler.SamplerInput` (either synchronous or asynchronous) which controls
  19. the environment data collection loop.
  20. In the online case (i.e. environment is available) as well as the offline case (i.e. no environment),
  21. :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` uses the :py:meth:`~ray.rllib.evaluation.rollout_worker.RolloutWorker.sample` method to
  22. get :py:class:`~ray.rllib.policy.sample_batch.SampleBatch` objects for training.
  23. .. _rolloutworker-reference-docs:
  24. RolloutWorker API
  25. -----------------
  26. .. currentmodule:: ray.rllib.evaluation.rollout_worker
  27. Constructor
  28. ~~~~~~~~~~~
  29. .. autosummary::
  30. :toctree: doc/
  31. RolloutWorker
  32. Multi agent
  33. ~~~~~~~~~~~
  34. .. autosummary::
  35. :toctree: doc/
  36. ~RolloutWorker.add_policy
  37. ~RolloutWorker.remove_policy
  38. ~RolloutWorker.get_policy
  39. ~RolloutWorker.set_is_policy_to_train
  40. ~RolloutWorker.set_policy_mapping_fn
  41. ~RolloutWorker.for_policy
  42. ~RolloutWorker.foreach_policy
  43. ~RolloutWorker.foreach_policy_to_train
  44. Setter and getter methods
  45. ~~~~~~~~~~~~~~~~~~~~~~~~~
  46. .. autosummary::
  47. :toctree: doc/
  48. ~RolloutWorker.get_filters
  49. ~RolloutWorker.get_global_vars
  50. ~RolloutWorker.set_global_vars
  51. ~RolloutWorker.get_host
  52. ~RolloutWorker.get_metrics
  53. ~RolloutWorker.get_node_ip
  54. ~RolloutWorker.get_weights
  55. ~RolloutWorker.set_weights
  56. ~RolloutWorker.get_state
  57. ~RolloutWorker.set_state
  58. Threading
  59. ~~~~~~~~~
  60. .. autosummary::
  61. :toctree: doc/
  62. ~RolloutWorker.lock
  63. ~RolloutWorker.unlock
  64. Sampling API
  65. ~~~~~~~~~~~~
  66. .. autosummary::
  67. :toctree: doc/
  68. ~RolloutWorker.sample
  69. ~RolloutWorker.sample_with_count
  70. ~RolloutWorker.sample_and_learn
  71. Training API
  72. ~~~~~~~~~~~~
  73. .. autosummary::
  74. :toctree: doc/
  75. ~RolloutWorker.learn_on_batch
  76. ~RolloutWorker.setup_torch_data_parallel
  77. ~RolloutWorker.compute_gradients
  78. ~RolloutWorker.apply_gradients
  79. Environment API
  80. ~~~~~~~~~~~~~~~
  81. .. autosummary::
  82. :toctree: doc/
  83. ~RolloutWorker.foreach_env
  84. ~RolloutWorker.foreach_env_with_context
  85. Miscellaneous
  86. ~~~~~~~~~~~~~
  87. .. autosummary::
  88. :toctree: doc/
  89. ~RolloutWorker.stop
  90. ~RolloutWorker.apply
  91. ~RolloutWorker.sync_filters
  92. ~RolloutWorker.find_free_port
  93. ~RolloutWorker.creation_args
  94. ~RolloutWorker.assert_healthy
  95. .. _workerset-reference-docs:
  96. WorkerSet API
  97. -------------
  98. .. currentmodule:: ray.rllib.evaluation.worker_set
  99. Constructor
  100. ~~~~~~~~~~~
  101. .. autosummary::
  102. :toctree: doc/
  103. WorkerSet
  104. WorkerSet.stop
  105. WorkerSet.reset
  106. Worker Orchestration
  107. ~~~~~~~~~~~~~~~~~~~~
  108. .. autosummary::
  109. :toctree: doc/
  110. ~WorkerSet.add_workers
  111. ~WorkerSet.foreach_worker
  112. ~WorkerSet.foreach_worker_with_id
  113. ~WorkerSet.foreach_worker_async
  114. ~WorkerSet.fetch_ready_async_reqs
  115. ~WorkerSet.num_in_flight_async_reqs
  116. ~WorkerSet.local_worker
  117. ~WorkerSet.remote_workers
  118. ~WorkerSet.num_healthy_remote_workers
  119. ~WorkerSet.num_healthy_workers
  120. ~WorkerSet.num_remote_worker_restarts
  121. ~WorkerSet.probe_unhealthy_workers
  122. Pass-through methods
  123. ~~~~~~~~~~~~~~~~~~~~
  124. .. autosummary::
  125. :toctree: doc/
  126. ~WorkerSet.add_policy
  127. ~WorkerSet.foreach_env
  128. ~WorkerSet.foreach_env_with_context
  129. ~WorkerSet.foreach_policy
  130. ~WorkerSet.foreach_policy_to_train
  131. ~WorkerSet.sync_weights
  132. Sampler API
  133. -----------
  134. :py:class:`~ray.rllib.offline.input_reader.InputReader` instances are used to collect and return experiences from the envs.
  135. For more details on `InputReader` used for offline RL (e.g. reading files of
  136. pre-recorded data), see the :ref:`offline RL API reference here <offline-reference-docs>`.
  137. Input Reader API
  138. ~~~~~~~~~~~~~~~~~~~~
  139. .. currentmodule:: ray.rllib.offline.input_reader
  140. .. autosummary::
  141. :toctree: doc/
  142. InputReader
  143. InputReader.next
  144. Input Sampler API
  145. ~~~~~~~~~~~~~~~~~~~~
  146. .. currentmodule:: ray.rllib.evaluation.sampler
  147. .. autosummary::
  148. :toctree: doc/
  149. SamplerInput
  150. SamplerInput.get_data
  151. SamplerInput.get_extra_batches
  152. SamplerInput.get_metrics
  153. Synchronous Sampler API
  154. ~~~~~~~~~~~~~~~~~~~~~~~
  155. .. currentmodule:: ray.rllib.evaluation.sampler
  156. .. autosummary::
  157. :toctree: doc/
  158. SyncSampler
  159. Asynchronous Sampler API
  160. ~~~~~~~~~~~~~~~~~~~~~~~~
  161. .. currentmodule:: ray.rllib.evaluation.sampler
  162. .. autosummary::
  163. :toctree: doc/
  164. AsyncSampler
  165. .. _offline-reference-docs:
  166. Offline Sampler API
  167. ~~~~~~~~~~~~~~~~~~~~~~~
  168. The InputReader API is used by an individual :py:class:`~ray.rllib.evaluation.rollout_worker.RolloutWorker`
  169. to produce batches of experiences either from an simulator or from an
  170. offline source (e.g. a file).
  171. Here are some example extentions of the InputReader API:
  172. JSON reader API
  173. ++++++++++++++++
  174. .. currentmodule:: ray.rllib.offline.json_reader
  175. .. autosummary::
  176. :toctree: doc/
  177. JsonReader
  178. JsonReader.read_all_files
  179. .. currentmodule:: ray.rllib.offline.mixed_input
  180. Mixed input reader
  181. ++++++++++++++++++
  182. .. autosummary::
  183. :toctree: doc/
  184. MixedInput
  185. .. currentmodule:: ray.rllib.offline.d4rl_reader
  186. D4RL reader
  187. +++++++++++
  188. .. autosummary::
  189. :toctree: doc/
  190. D4RLReader
  191. .. currentmodule:: ray.rllib.offline.io_context
  192. IOContext
  193. ~~~~~~~~~
  194. .. autosummary::
  195. :toctree: doc/
  196. IOContext
  197. IOContext.default_sampler_input
  198. Policy Map API
  199. --------------
  200. .. currentmodule:: ray.rllib.policy.policy_map
  201. .. autosummary::
  202. :toctree: doc/
  203. PolicyMap
  204. PolicyMap.items
  205. PolicyMap.keys
  206. PolicyMap.values
  207. Sample batch API
  208. ----------------
  209. .. currentmodule:: ray.rllib.policy.sample_batch
  210. .. autosummary::
  211. :toctree: doc/
  212. SampleBatch
  213. SampleBatch.set_get_interceptor
  214. SampleBatch.is_training
  215. SampleBatch.set_training
  216. SampleBatch.as_multi_agent
  217. SampleBatch.get
  218. SampleBatch.to_device
  219. SampleBatch.right_zero_pad
  220. SampleBatch.slice
  221. SampleBatch.split_by_episode
  222. SampleBatch.shuffle
  223. SampleBatch.columns
  224. SampleBatch.rows
  225. SampleBatch.copy
  226. SampleBatch.is_single_trajectory
  227. SampleBatch.is_terminated_or_truncated
  228. SampleBatch.env_steps
  229. SampleBatch.agent_steps
  230. MultiAgent batch API
  231. --------------------
  232. .. currentmodule:: ray.rllib.policy.sample_batch
  233. .. autosummary::
  234. :toctree: doc/
  235. MultiAgentBatch
  236. MultiAgentBatch.env_steps
  237. MultiAgentBatch.agent_steps