raysgd_tune.rst 2.9 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
  1. .. _raysgd-tune:
  2. RaySGD Hyperparameter Tuning
  3. ============================
  4. RaySGD integrates with :ref:`Ray Tune <tune-60-seconds>` to easily run distributed hyperparameter tuning experiments with your RaySGD Trainer.
  5. PyTorch
  6. -------
  7. .. tip:: If you want to leverage multi-node data parallel training with PyTorch while using RayTune *without* using RaySGD, check out the :ref:`Tune PyTorch user guide <tune-pytorch-cifar>` and Tune's lightweight :ref:`distributed pytorch integrations <tune-ddp-doc>`.
  8. ``TorchTrainer`` naturally integrates with Tune via the ``BaseTorchTrainable`` interface. Without changing any arguments, you can call ``TorchTrainer.as_trainable(...)`` to create a Tune-compatible class.
  9. Then, you can simply pass the returned Trainable class to ``tune.run``. The ``config`` used for each ``Trainable`` in tune will automatically be passed down to the ``TorchTrainer``.
  10. Therefore, each trial will have its own ``TorchTrainable`` that holds an instance of the ``TorchTrainer`` with its own unique hyperparameter configuration.
  11. See the documentation (:ref:`BaseTorchTrainable-doc`) for more info.
  12. .. literalinclude:: ../../../python/ray/util/sgd/torch/examples/tune_example.py
  13. :language: python
  14. :start-after: __torch_tune_example__
  15. :end-before: __end_torch_tune_example__
  16. By default the training step for the returned ``Trainable`` will run one epoch of training and one epoch of validation, and will report
  17. the combined result dictionaries to Tune.
  18. By combining RaySGD with Tune, each individual trial will be run in a distributed fashion with ``num_workers`` workers,
  19. but there can be multiple trials running in parallel as well.
  20. Custom Training Step
  21. ~~~~~~~~~~~~~~~~~~~~
  22. Sometimes it is necessary to provide a custom training step, for example if you want to run more than one epoch of training for
  23. each tune iteration, or you need to manually update the scheduler after validation. Custom training steps can easily be provided by passing
  24. in a ``override_tune_step`` function to ``TorchTrainer.as_trainable(...)``.
  25. .. literalinclude:: ../../../python/ray/util/sgd/torch/examples/tune_example.py
  26. :language: python
  27. :start-after: __torch_tune_manual_lr_example__
  28. :end-before: __end_torch_tune_manual_lr_example__
  29. Your custom step function should take in two arguments: an instance of the ``TorchTrainer`` and an ``info`` dict containing other potentially
  30. necessary information.
  31. The info dict contains the following values:
  32. .. code-block:: python
  33. # The current Tune iteration.
  34. # This may be different than the number of epochs trained if each tune step does more than one epoch of training.
  35. iteration
  36. If you would like any other information to be available in the ``info`` dict please file a feature request on `Github Issues <https://github.com/ray-project/ray/issues>`_!
  37. You can see the `Tune example script <https://github.com/ray-project/ray/blob/master/python/ray/util/sgd/torch/examples/tune_example.py>`_ for an end-to-end example.