rllib-training.rst 24 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611
  1. .. include:: /_includes/rllib/announcement.rst
  2. .. include:: /_includes/rllib/we_are_hiring.rst
  3. .. _rllib-getting-started:
  4. Getting Started with RLlib
  5. ==========================
  6. At a high level, RLlib provides you with an ``Algorithm`` class which
  7. holds a policy for environment interaction.
  8. Through the algorithm's interface, you can train the policy compute actions, or store
  9. your algorithms.
  10. In multi-agent training, the algorithm manages the querying
  11. and optimization of multiple policies at once.
  12. .. image:: images/rllib-api.svg
  13. In this guide, we will first walk you through running your first experiments with
  14. the RLlib CLI, and then discuss our Python API in more detail.
  15. Using the RLlib CLI
  16. -------------------
  17. The quickest way to run your first RLlib algorithm is to use the command line interface.
  18. You can train DQN with the following commands:
  19. .. raw:: html
  20. <div class="termynal" data-termynal>
  21. <span data-ty="input">pip install "ray[rllib]" tensorflow</span>
  22. <span data-ty="input">rllib train --algo DQN --env CartPole-v1 --stop '{"training_iteration": 30}'</span>
  23. </div>
  24. .. margin::
  25. The ``rllib train`` command (same as the ``train.py`` script in the repo)
  26. has a number of options you can show by running `rllib train --help`.
  27. Note that you choose any supported RLlib algorithm (``--algo``) and environment (``--env``).
  28. RLlib supports any Farama-Foundation Gymnasium environment, as well as a number of other environments
  29. (see :ref:`rllib-environments-doc`).
  30. It also supports a large number of algorithms (see :ref:`rllib-algorithms-doc`) to
  31. choose from.
  32. Running the above will return one of the `checkpoints` that get generated during training after 30 training iterations,
  33. as well as a command that you can use to evaluate the trained algorithm.
  34. You can evaluate the trained algorithm with the following command (assuming the checkpoint path is called ``checkpoint``):
  35. .. raw:: html
  36. <div class="termynal" data-termynal>
  37. <span data-ty="input">rllib evaluate checkpoint --algo DQN --env CartPole-v1</span>
  38. </div>
  39. .. note::
  40. By default, the results will be logged to a subdirectory of ``~/ray_results``.
  41. This subdirectory will contain a file ``params.json`` which contains the
  42. hyper-parameters, a file ``result.json`` which contains a training summary
  43. for each episode and a TensorBoard file that can be used to visualize
  44. training process with TensorBoard by running
  45. .. code-block:: bash
  46. tensorboard --logdir=~/ray_results
  47. For more advanced evaluation functionality, refer to `Customized Evaluation During Training <#customized-evaluation-during-training>`__.
  48. .. note::
  49. Each algorithm has specific hyperparameters that can be set with ``--config``,
  50. see the `algorithms documentation <rllib-algorithms.html>`__ for more information.
  51. For instance, you can train the A2C algorithm on 8 workers by specifying
  52. `num_workers: 8` in a JSON string passed to ``--config``:
  53. .. code-block:: bash
  54. rllib train --env=PongDeterministic-v4 --run=A2C --config '{"num_workers": 8}'
  55. Running Tuned Examples
  56. ----------------------
  57. Some good hyperparameters and settings are available in
  58. `the RLlib repository <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples>`__
  59. (some of them are tuned to run on GPUs).
  60. .. margin::
  61. If you find better settings or tune an algorithm on a different domain,
  62. consider submitting a Pull Request!
  63. You can run these with the ``rllib train file`` command as follows:
  64. .. raw:: html
  65. <div class="termynal" data-termynal>
  66. <span data-ty="input">rllib train file /path/to/tuned/example.yaml</span>
  67. </div>
  68. Note that this works with any local YAML file in the correct format, or with remote URLs
  69. pointing to such files.
  70. If you want to learn more about the RLlib CLI, please check out
  71. the :ref:`RLlib CLI user guide <rllib-cli-doc>`.
  72. .. _rllib-training-api:
  73. Using the Python API
  74. --------------------
  75. The Python API provides the needed flexibility for applying RLlib to new problems.
  76. For instance, you will need to use this API if you wish to use
  77. `custom environments, preprocessors, or models <rllib-models.html>`__ with RLlib.
  78. Here is an example of the basic usage.
  79. We first create a `PPOConfig` and add properties to it, like the `environment` we want
  80. to use, or the `resources` we want to leverage for training.
  81. After we `build` the `algo` from its configuration, we can `train` it for a number of
  82. episodes (here `10`) and `save` the resulting policy periodically
  83. (here every `5` episodes).
  84. .. literalinclude:: ./doc_code/getting_started.py
  85. :language: python
  86. :start-after: rllib-first-config-begin
  87. :end-before: rllib-first-config-end
  88. All RLlib algorithms are compatible with the :ref:`Tune API <tune-api-ref>`.
  89. This enables them to be easily used in experiments with :ref:`Ray Tune <tune-main>`.
  90. For example, the following code performs a simple hyper-parameter sweep of PPO.
  91. .. literalinclude:: ./doc_code/getting_started.py
  92. :dedent: 4
  93. :language: python
  94. :start-after: rllib-tune-config-begin
  95. :end-before: rllib-tune-config-end
  96. Tune will schedule the trials to run in parallel on your Ray cluster:
  97. ::
  98. == Status ==
  99. Using FIFO scheduling algorithm.
  100. Resources requested: 4/4 CPUs, 0/0 GPUs
  101. Result logdir: ~/ray_results/my_experiment
  102. PENDING trials:
  103. - PPO_CartPole-v1_2_lr=0.0001: PENDING
  104. RUNNING trials:
  105. - PPO_CartPole-v1_0_lr=0.01: RUNNING [pid=21940], 16 s, 4013 ts, 22 rew
  106. - PPO_CartPole-v1_1_lr=0.001: RUNNING [pid=21942], 27 s, 8111 ts, 54.7 rew
  107. ``Tuner.fit()`` returns an ``ResultGrid`` object that allows further analysis
  108. of the training results and retrieving the checkpoint(s) of the trained agent.
  109. .. literalinclude:: ./doc_code/getting_started.py
  110. :dedent: 4
  111. :language: python
  112. :start-after: rllib-tuner-begin
  113. :end-before: rllib-tuner-end
  114. .. margin::
  115. You can find your checkpoint's version by
  116. looking into the ``rllib_checkpoint.json`` file inside your checkpoint directory.
  117. Loading and restoring a trained algorithm from a checkpoint is simple.
  118. Let's assume you have a local checkpoint directory called ``checkpoint_path``.
  119. To load newer RLlib checkpoints (version >= 1.0), use the following code:
  120. .. code-block:: python
  121. from ray.rllib.algorithms.algorithm import Algorithm
  122. algo = Algorithm.from_checkpoint(checkpoint_path)
  123. For older RLlib checkpoint versions (version < 1.0), you can
  124. restore an algorithm via:
  125. .. code-block:: python
  126. from ray.rllib.algorithms.ppo import PPO
  127. algo = PPO(config=config, env=env_class)
  128. algo.restore(checkpoint_path)
  129. Computing Actions
  130. ~~~~~~~~~~~~~~~~~
  131. The simplest way to programmatically compute actions from a trained agent is to
  132. use ``Algorithm.compute_single_action()``.
  133. This method preprocesses and filters the observation before passing it to the agent
  134. policy.
  135. Here is a simple example of testing a trained agent for one episode:
  136. .. literalinclude:: ./doc_code/getting_started.py
  137. :language: python
  138. :start-after: rllib-compute-action-begin
  139. :end-before: rllib-compute-action-end
  140. For more advanced usage on computing actions and other functionality,
  141. you can consult the :ref:`RLlib Algorithm API documentation <rllib-algorithm-api>`.
  142. Accessing Policy State
  143. ~~~~~~~~~~~~~~~~~~~~~~
  144. It is common to need to access a algorithm's internal state, for instance to set
  145. or get model weights.
  146. In RLlib algorithm state is replicated across multiple *rollout workers* (Ray actors)
  147. in the cluster.
  148. However, you can easily get and update this state between calls to ``train()``
  149. via ``Algorithm.workers.foreach_worker()``
  150. or ``Algorithm.workers.foreach_worker_with_index()``.
  151. These functions take a lambda function that is applied with the worker as an argument.
  152. These functions return values for each worker as a list.
  153. You can also access just the "master" copy of the algorithm state through
  154. ``Algorithm.get_policy()`` or ``Algorithm.workers.local_worker()``,
  155. but note that updates here may not be immediately reflected in
  156. your rollout workers (if you have configured ``num_rollout_workers > 0``).
  157. Here's a quick example of how to access state of a model:
  158. .. literalinclude:: ./doc_code/getting_started.py
  159. :language: python
  160. :start-after: rllib-get-state-begin
  161. :end-before: rllib-get-state-end
  162. Accessing Model State
  163. ~~~~~~~~~~~~~~~~~~~~~
  164. Similar to accessing policy state, you may want to get a reference to the
  165. underlying neural network model being trained. For example, you may want to
  166. pre-train it separately, or otherwise update its weights outside of RLlib.
  167. This can be done by accessing the ``model`` of the policy.
  168. .. margin::
  169. To run these examples, you need to install a few extra dependencies, namely
  170. `pip install "gym[atari]" "gym[accept-rom-license]" atari_py`.
  171. Below you find three explicit examples showing how to access the model state of
  172. an algorithm.
  173. .. dropdown:: **Example: Preprocessing observations for feeding into a model**
  174. Then for the code:
  175. .. literalinclude:: doc_code/training.py
  176. :language: python
  177. :start-after: __preprocessing_observations_start__
  178. :end-before: __preprocessing_observations_end__
  179. .. dropdown:: **Example: Querying a policy's action distribution**
  180. .. literalinclude:: doc_code/training.py
  181. :language: python
  182. :start-after: __query_action_dist_start__
  183. :end-before: __query_action_dist_end__
  184. .. dropdown:: **Example: Getting Q values from a DQN model**
  185. .. literalinclude:: doc_code/training.py
  186. :language: python
  187. :start-after: __get_q_values_dqn_start__
  188. :end-before: __get_q_values_dqn_end__
  189. This is especially useful when used with
  190. `custom model classes <rllib-models.html>`__.
  191. .. _rllib-algo-configuration:
  192. Configuring RLlib Algorithms
  193. ----------------------------
  194. You can configure RLlib algorithms in a modular fashion by working with so-called
  195. `AlgorithmConfig` objects.
  196. In essence, you first create a `config = AlgorithmConfig()` object and then call methods
  197. on it to set the desired configuration options.
  198. Each RLlib algorithm has its own config class that inherits from `AlgorithmConfig`.
  199. For instance, to create a `PPO` algorithm, you start with a `PPOConfig` object, to work
  200. with a `DQN` algorithm, you start with a `DQNConfig` object, etc.
  201. .. note::
  202. Each algorithm has its specific settings, but most configuration options are shared.
  203. We discuss the common options below, and refer to
  204. :ref:`the RLlib algorithms guide <rllib-algorithms-doc>` for algorithm-specific
  205. properties.
  206. Algorithms differ mostly in their `training` settings.
  207. Below you find the basic signature of the `AlgorithmConfig` class, as well as some
  208. advanced usage examples:
  209. .. autoclass:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig
  210. :noindex:
  211. As RLlib algorithms are fairly complex, they come with many configuration options.
  212. To make things easier, the common properties of algorithms are naturally grouped into
  213. the following categories:
  214. - :ref:`training options <rllib-config-train>`,
  215. - :ref:`environment options <rllib-config-env>`,
  216. - :ref:`deep learning framework options <rllib-config-framework>`,
  217. - :ref:`rollout worker options <rllib-config-rollouts>`,
  218. - :ref:`evaluation options <rllib-config-evaluation>`,
  219. - :ref:`exploration options <rllib-config-exploration>`,
  220. - :ref:`options for training with offline data <rllib-config-offline_data>`,
  221. - :ref:`options for training multiple agents <rllib-config-multi_agent>`,
  222. - :ref:`reporting options <rllib-config-reporting>`,
  223. - :ref:`options for saving and restoring checkpoints <rllib-config-checkpointing>`,
  224. - :ref:`debugging options <rllib-config-debugging>`,
  225. - :ref:`options for adding callbacks to algorithms <rllib-config-callbacks>`,
  226. - :ref:`Resource options <rllib-config-resources>`
  227. - :ref:`and options for experimental features <rllib-config-experimental>`
  228. Let's discuss each category one by one, starting with training options.
  229. .. _rllib-config-train:
  230. Specifying Training Options
  231. ~~~~~~~~~~~~~~~~~~~~~~~~~~~
  232. .. margin::
  233. For instance, a `DQNConfig` takes a `double_q` `training` argument to specify whether
  234. to use a double-Q DQN, whereas in a `PPOConfig` this does not make sense.
  235. For individual algorithms, this is probably the most relevant configuration group,
  236. as this is where all the algorithm-specific options go.
  237. But the base configuration for `training` of an `AlgorithmConfig` is actually quite small:
  238. .. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.training
  239. :noindex:
  240. .. _rllib-config-env:
  241. Specifying Environments
  242. ~~~~~~~~~~~~~~~~~~~~~~~
  243. .. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.environment
  244. :noindex:
  245. .. _rllib-config-framework:
  246. Specifying Framework Options
  247. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  248. .. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.framework
  249. :noindex:
  250. .. _rllib-config-rollouts:
  251. Specifying Rollout Workers
  252. ~~~~~~~~~~~~~~~~~~~~~~~~~~
  253. .. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.rollouts
  254. :noindex:
  255. .. _rllib-config-evaluation:
  256. Specifying Evaluation Options
  257. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  258. .. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.evaluation
  259. :noindex:
  260. .. _rllib-config-exploration:
  261. Specifying Exploration Options
  262. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  263. .. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.exploration
  264. :noindex:
  265. .. _rllib-config-offline_data:
  266. Specifying Offline Data Options
  267. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  268. .. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.offline_data
  269. :noindex:
  270. .. _rllib-config-multi_agent:
  271. Specifying Multi-Agent Options
  272. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  273. .. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.multi_agent
  274. :noindex:
  275. .. _rllib-config-reporting:
  276. Specifying Reporting Options
  277. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  278. .. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.reporting
  279. :noindex:
  280. .. _rllib-config-checkpointing:
  281. Specifying Checkpointing Options
  282. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  283. .. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.checkpointing
  284. :noindex:
  285. .. _rllib-config-debugging:
  286. Specifying Debugging Options
  287. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  288. .. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.debugging
  289. :noindex:
  290. .. _rllib-config-callbacks:
  291. Specifying Callback Options
  292. ~~~~~~~~~~~~~~~~~~~~~~~~~~~
  293. .. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.callbacks
  294. :noindex:
  295. .. _rllib-config-resources:
  296. Specifying Resources
  297. ~~~~~~~~~~~~~~~~~~~~
  298. You can control the degree of parallelism used by setting the ``num_workers``
  299. hyperparameter for most algorithms. The Algorithm will construct that many
  300. "remote worker" instances (`see RolloutWorker class <https://github.com/ray-project/ray/blob/master/rllib/evaluation/rollout_worker.py>`__)
  301. that are constructed as ray.remote actors, plus exactly one "local worker", a ``RolloutWorker`` object that is not a
  302. ray actor, but lives directly inside the Algorithm.
  303. For most algorithms, learning updates are performed on the local worker and sample collection from
  304. one or more environments is performed by the remote workers (in parallel).
  305. For example, setting ``num_workers=0`` will only create the local worker, in which case both
  306. sample collection and training will be done by the local worker.
  307. On the other hand, setting ``num_workers=5`` will create the local worker (responsible for training updates)
  308. and 5 remote workers (responsible for sample collection).
  309. Since learning is most of the time done on the local worker, it may help to provide one or more GPUs
  310. to that worker via the ``num_gpus`` setting.
  311. Similarly, the resource allocation to remote workers can be controlled via ``num_cpus_per_worker``, ``num_gpus_per_worker``, and ``custom_resources_per_worker``.
  312. The number of GPUs can be fractional quantities (e.g. 0.5) to allocate only a fraction
  313. of a GPU. For example, with DQN you can pack five algorithms onto one GPU by setting
  314. ``num_gpus: 0.2``. Check out `this fractional GPU example here <https://github.com/ray-project/ray/blob/master/rllib/examples/fractional_gpus.py>`__
  315. as well that also demonstrates how environments (running on the remote workers) that
  316. require a GPU can benefit from the ``num_gpus_per_worker`` setting.
  317. For synchronous algorithms like PPO and A2C, the driver and workers can make use of
  318. the same GPU. To do this for an amount of ``n`` GPUS:
  319. .. code-block:: python
  320. gpu_count = n
  321. num_gpus = 0.0001 # Driver GPU
  322. num_gpus_per_worker = (gpu_count - num_gpus) / num_workers
  323. .. Original image: https://docs.google.com/drawings/d/14QINFvx3grVyJyjAnjggOCEVN-Iq6pYVJ3jA2S6j8z0/edit?usp=sharing
  324. .. image:: images/rllib-config.svg
  325. If you specify ``num_gpus`` and your machine does not have the required number of GPUs
  326. available, a RuntimeError will be thrown by the respective worker. On the other hand,
  327. if you set ``num_gpus=0``, your policies will be built solely on the CPU, even if
  328. GPUs are available on the machine.
  329. .. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.resources
  330. :noindex:
  331. .. _rllib-config-experimental:
  332. Specifying Experimental Features
  333. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  334. .. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.experimental
  335. :noindex:
  336. .. _rllib-scaling-guide:
  337. RLlib Scaling Guide
  338. -------------------
  339. Here are some rules of thumb for scaling training with RLlib.
  340. 1. If the environment is slow and cannot be replicated (e.g., since it requires interaction with physical systems), then you should use a sample-efficient off-policy algorithm such as :ref:`DQN <dqn>` or :ref:`SAC <sac>`. These algorithms default to ``num_workers: 0`` for single-process operation. Make sure to set ``num_gpus: 1`` if you want to use a GPU. Consider also batch RL training with the `offline data <rllib-offline.html>`__ API.
  341. 2. If the environment is fast and the model is small (most models for RL are), use time-efficient algorithms such as :ref:`PPO <ppo>`, :ref:`IMPALA <impala>`, or :ref:`APEX <apex>`. These can be scaled by increasing ``num_workers`` to add rollout workers. It may also make sense to enable `vectorization <rllib-env.html#vectorized>`__ for inference. Make sure to set ``num_gpus: 1`` if you want to use a GPU. If the learner becomes a bottleneck, multiple GPUs can be used for learning by setting ``num_gpus > 1``.
  342. 3. If the model is compute intensive (e.g., a large deep residual network) and inference is the bottleneck, consider allocating GPUs to workers by setting ``num_gpus_per_worker: 1``. If you only have a single GPU, consider ``num_workers: 0`` to use the learner GPU for inference. For efficient use of GPU time, use a small number of GPU workers and a large number of `envs per worker <rllib-env.html#vectorized>`__.
  343. 4. Finally, if both model and environment are compute intensive, then enable `remote worker envs <rllib-env.html#vectorized>`__ with `async batching <rllib-env.html#vectorized>`__ by setting ``remote_worker_envs: True`` and optionally ``remote_env_batch_wait_ms``. This batches inference on GPUs in the rollout workers while letting envs run asynchronously in separate actors, similar to the `SEED <https://ai.googleblog.com/2020/03/massively-scaling-reinforcement.html>`__ architecture. The number of workers and number of envs per worker should be tuned to maximize GPU utilization. If your env requires GPUs to function, or if multi-node SGD is needed, then also consider :ref:`DD-PPO <ddppo>`.
  344. In case you are using lots of workers (``num_workers >> 10``) and you observe worker failures for whatever reasons, which normally interrupt your RLlib training runs, consider using
  345. the config settings ``ignore_worker_failures=True``, ``recreate_failed_workers=True``, or ``restart_failed_sub_environments=True``:
  346. ``ignore_worker_failures``: When set to True, your Algorithm will not crash due to a single worker error but continue for as long as there is at least one functional worker remaining.
  347. ``recreate_failed_workers``: When set to True, your Algorithm will attempt to replace/recreate any failed worker(s) with newly created one(s). This way, your number of workers will never decrease, even if some of them fail from time to time.
  348. ``restart_failed_sub_environments``: When set to True and there is a failure in one of the vectorized sub-environments in one of your workers, the worker will try to recreate only the failed sub-environment and re-integrate the newly created one into your vectorized env stack on that worker.
  349. Note that only one of ``ignore_worker_failures`` or ``recreate_failed_workers`` may be set to True (they are mutually exclusive settings). However,
  350. you can combine each of these with the ``restart_failed_sub_environments=True`` setting.
  351. Using these options will make your training runs much more stable and more robust against occasional OOM or other similar "once in a while" errors on your workers
  352. themselves or inside your environments.
  353. Debugging RLlib Experiments
  354. ---------------------------
  355. Gym Monitor
  356. ~~~~~~~~~~~
  357. The ``"monitor": true`` config can be used to save Gym episode videos to the result dir. For example:
  358. .. code-block:: bash
  359. rllib train --env=PongDeterministic-v4 \
  360. --run=A2C --config '{"num_workers": 2, "monitor": true}'
  361. # videos will be saved in the ~/ray_results/<experiment> dir, for example
  362. openaigym.video.0.31401.video000000.meta.json
  363. openaigym.video.0.31401.video000000.mp4
  364. openaigym.video.0.31403.video000000.meta.json
  365. openaigym.video.0.31403.video000000.mp4
  366. Eager Mode
  367. ~~~~~~~~~~
  368. Policies built with ``build_tf_policy`` (most of the reference algorithms are)
  369. can be run in eager mode by setting the
  370. ``"framework": "tf2"`` / ``"eager_tracing": true`` config options or using
  371. ``rllib train --config '{"framework": "tf2"}' [--trace]``.
  372. This will tell RLlib to execute the model forward pass, action distribution,
  373. loss, and stats functions in eager mode.
  374. Eager mode makes debugging much easier, since you can now use line-by-line
  375. debugging with breakpoints or Python ``print()`` to inspect
  376. intermediate tensor values.
  377. However, eager can be slower than graph mode unless tracing is enabled.
  378. Using PyTorch
  379. ~~~~~~~~~~~~~
  380. Algorithms that have an implemented TorchPolicy, will allow you to run
  381. `rllib train` using the command line ``--framework=torch`` flag.
  382. Algorithms that do not have a torch version yet will complain with an error in
  383. this case.
  384. Episode Traces
  385. ~~~~~~~~~~~~~~
  386. You can use the `data output API <rllib-offline.html>`__ to save episode traces
  387. for debugging. For example, the following command will run PPO while saving episode
  388. traces to ``/tmp/debug``.
  389. .. code-block:: bash
  390. rllib train --run=PPO --env=CartPole-v1 \
  391. --config='{"output": "/tmp/debug", "output_compress_columns": []}'
  392. # episode traces will be saved in /tmp/debug, for example
  393. output-2019-02-23_12-02-03_worker-2_0.json
  394. output-2019-02-23_12-02-04_worker-1_0.json
  395. Log Verbosity
  396. ~~~~~~~~~~~~~
  397. You can control the log level via the ``"log_level"`` flag. Valid values are "DEBUG",
  398. "INFO", "WARN" (default), and "ERROR". This can be used to increase or decrease the
  399. verbosity of internal logging. You can also use the ``-v`` and ``-vv`` flags.
  400. For example, the following two commands are about equivalent:
  401. .. code-block:: bash
  402. rllib train --env=PongDeterministic-v4 \
  403. --run=A2C --config '{"num_workers": 2, "log_level": "DEBUG"}'
  404. rllib train --env=PongDeterministic-v4 \
  405. --run=A2C --config '{"num_workers": 2}' -vv
  406. The default log level is ``WARN``. We strongly recommend using at least ``INFO``
  407. level logging for development.
  408. Stack Traces
  409. ~~~~~~~~~~~~
  410. You can use the ``ray stack`` command to dump the stack traces of all the
  411. Python workers on a single node. This can be useful for debugging unexpected
  412. hangs or performance issues.
  413. Next Steps
  414. ----------
  415. - To check how your application is doing, you can use the :ref:`Ray dashboard <observability-getting-started>`.