algorithm.rst 5.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178
  1. .. algorithm-reference-docs:
  2. Algorithms
  3. ==========
  4. The :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` class is the highest-level API in RLlib responsible for **WHEN** and **WHAT** of RL algorithms. Things like **WHEN** should we sample the algorithm, **WHEN** should we perform a neural network update, and so on. The **HOW** will be delegated to components such as :py:class:`~ray.rllib.evaluation.rollout_worker.RolloutWorker`, etc.. It is the main entry point for RLlib users to interact with RLlib's algorithms.
  5. It allows you to train and evaluate policies, save an experiment's progress and restore from
  6. a prior saved experiment when continuing an RL run.
  7. :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` is a sub-class
  8. of :py:class:`~ray.tune.trainable.Trainable`
  9. and thus fully supports distributed hyperparameter tuning for RL.
  10. .. https://docs.google.com/drawings/d/1J0nfBMZ8cBff34e-nSPJZMM1jKOuUL11zFJm6CmWtJU/edit
  11. .. figure:: ../images/trainer_class_overview.svg
  12. :align: left
  13. **A typical RLlib Algorithm object:** Algorhtms are normally comprised of
  14. N :py:class:`~ray.rllib.evaluation.rollout_worker.RolloutWorker` that
  15. orchestrated via a :py:class:`~ray.rllib.evaluation.worker_set.WorkerSet` object.
  16. Each worker own its own a set of :py:class:`~ray.rllib.policy.policy.Policy` objects and their NN models per worker, plus a :py:class:`~ray.rllib.env.base_env.BaseEnv` instance per worker.
  17. .. _algo-config-api:
  18. Algorithm Configuration API
  19. ----------------------------
  20. The :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig` class represents
  21. the primary way of configuring and building an :py:class:`~ray.rllib.algorithms.algorithm.Algorithm`.
  22. You don't use ``AlgorithmConfig`` directly in practice, but rather use its algorithm-specific
  23. implementations such as :py:class:`~ray.rllib.algorithms.ppo.ppo.PPOConfig`, which each come
  24. with their own set of arguments to their respective ``.training()`` method.
  25. .. currentmodule:: ray.rllib.algorithms.algorithm_config
  26. Constructor
  27. ~~~~~~~~~~~
  28. .. autosummary::
  29. :toctree: doc/
  30. ~AlgorithmConfig
  31. Public methods
  32. ~~~~~~~~~~~~~~
  33. .. autosummary::
  34. :toctree: doc/
  35. ~AlgorithmConfig.build
  36. ~AlgorithmConfig.freeze
  37. ~AlgorithmConfig.copy
  38. ~AlgorithmConfig.validate
  39. Configuration methods
  40. ~~~~~~~~~~~~~~~~~~~~~
  41. .. autosummary::
  42. :toctree: doc/
  43. ~AlgorithmConfig.callbacks
  44. ~AlgorithmConfig.debugging
  45. ~AlgorithmConfig.environment
  46. ~AlgorithmConfig.evaluation
  47. ~AlgorithmConfig.experimental
  48. ~AlgorithmConfig.fault_tolerance
  49. ~AlgorithmConfig.framework
  50. ~AlgorithmConfig.multi_agent
  51. ~AlgorithmConfig.offline_data
  52. ~AlgorithmConfig.python_environment
  53. ~AlgorithmConfig.reporting
  54. ~AlgorithmConfig.resources
  55. ~AlgorithmConfig.rl_module
  56. ~AlgorithmConfig.rollouts
  57. ~AlgorithmConfig.training
  58. Getter methods
  59. ~~~~~~~~~~~~~~
  60. .. autosummary::
  61. :toctree: doc/
  62. ~AlgorithmConfig.get_default_learner_class
  63. ~AlgorithmConfig.get_default_rl_module_spec
  64. ~AlgorithmConfig.get_evaluation_config_object
  65. ~AlgorithmConfig.get_marl_module_spec
  66. ~AlgorithmConfig.get_multi_agent_setup
  67. ~AlgorithmConfig.get_rollout_fragment_length
  68. Miscellaneous methods
  69. ~~~~~~~~~~~~~~~~~~~~~
  70. .. autosummary::
  71. :toctree: doc/
  72. ~AlgorithmConfig.validate_train_batch_size_vs_rollout_fragment_length
  73. Building Custom Algorithm Classes
  74. ---------------------------------
  75. .. warning::
  76. As of Ray >= 1.9, it is no longer recommended to use the `build_trainer()` utility
  77. function for creating custom Algorithm sub-classes.
  78. Instead, follow the simple guidelines here for directly sub-classing from
  79. :py:class:`~ray.rllib.algorithms.algorithm.Algorithm`.
  80. In order to create a custom Algorithm, sub-class the
  81. :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` class
  82. and override one or more of its methods. Those are in particular:
  83. * :py:meth:`~ray.rllib.algorithms.algorithm.Algorithm.setup`
  84. * :py:meth:`~ray.rllib.algorithms.algorithm.Algorithm.get_default_config`
  85. * :py:meth:`~ray.rllib.algorithms.algorithm.Algorithm.get_default_policy_class`
  86. * :py:meth:`~ray.rllib.algorithms.algorithm.Algorithm.training_step`
  87. `See here for an example on how to override Algorithm <https://github.com/ray-project/ray/blob/master/rllib/algorithms/ppo/ppo.py>`_.
  88. .. _rllib-algorithm-api:
  89. Algorithm API
  90. -------------
  91. .. currentmodule:: ray.rllib.algorithms.algorithm
  92. Constructor
  93. ~~~~~~~~~~~
  94. .. autosummary::
  95. :toctree: doc/
  96. ~Algorithm
  97. Inference and Evaluation
  98. ~~~~~~~~~~~~~~~~~~~~~~~~
  99. .. autosummary::
  100. :toctree: doc/
  101. ~Algorithm.compute_actions
  102. ~Algorithm.compute_single_action
  103. ~Algorithm.evaluate
  104. Saving and Restoring
  105. ~~~~~~~~~~~~~~~~~~~~
  106. .. autosummary::
  107. :toctree: doc/
  108. ~Algorithm.from_checkpoint
  109. ~Algorithm.from_state
  110. ~Algorithm.get_weights
  111. ~Algorithm.set_weights
  112. ~Algorithm.export_model
  113. ~Algorithm.export_policy_checkpoint
  114. ~Algorithm.export_policy_model
  115. ~Algorithm.import_policy_model_from_h5
  116. ~Algorithm.restore
  117. ~Algorithm.restore_from_object
  118. ~Algorithm.restore_workers
  119. ~Algorithm.save
  120. ~Algorithm.save_checkpoint
  121. ~Algorithm.save_to_object
  122. Training
  123. ~~~~~~~~
  124. .. autosummary::
  125. :toctree: doc/
  126. ~Algorithm.train
  127. ~Algorithm.training_step
  128. Multi Agent
  129. ~~~~~~~~~~~
  130. .. autosummary::
  131. :toctree: doc/
  132. ~Algorithm.add_policy
  133. ~Algorithm.remove_policy