1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253 |
- .. _raysgd-tune:
- RaySGD Hyperparameter Tuning
- ============================
- RaySGD integrates with :ref:`Ray Tune <tune-60-seconds>` to easily run distributed hyperparameter tuning experiments with your RaySGD Trainer.
- PyTorch
- -------
- .. tip:: If you want to leverage multi-node data parallel training with PyTorch while using RayTune *without* using RaySGD, check out the :ref:`Tune PyTorch user guide <tune-pytorch-cifar>` and Tune's lightweight :ref:`distributed pytorch integrations <tune-ddp-doc>`.
- ``TorchTrainer`` naturally integrates with Tune via the ``BaseTorchTrainable`` interface. Without changing any arguments, you can call ``TorchTrainer.as_trainable(...)`` to create a Tune-compatible class.
- Then, you can simply pass the returned Trainable class to ``tune.run``. The ``config`` used for each ``Trainable`` in tune will automatically be passed down to the ``TorchTrainer``.
- Therefore, each trial will have its own ``TorchTrainable`` that holds an instance of the ``TorchTrainer`` with its own unique hyperparameter configuration.
- See the documentation (:ref:`BaseTorchTrainable-doc`) for more info.
- .. literalinclude:: ../../../python/ray/util/sgd/torch/examples/tune_example.py
- :language: python
- :start-after: __torch_tune_example__
- :end-before: __end_torch_tune_example__
- By default the training step for the returned ``Trainable`` will run one epoch of training and one epoch of validation, and will report
- the combined result dictionaries to Tune.
- By combining RaySGD with Tune, each individual trial will be run in a distributed fashion with ``num_workers`` workers,
- but there can be multiple trials running in parallel as well.
- Custom Training Step
- ~~~~~~~~~~~~~~~~~~~~
- Sometimes it is necessary to provide a custom training step, for example if you want to run more than one epoch of training for
- each tune iteration, or you need to manually update the scheduler after validation. Custom training steps can easily be provided by passing
- in a ``override_tune_step`` function to ``TorchTrainer.as_trainable(...)``.
- .. literalinclude:: ../../../python/ray/util/sgd/torch/examples/tune_example.py
- :language: python
- :start-after: __torch_tune_manual_lr_example__
- :end-before: __end_torch_tune_manual_lr_example__
- Your custom step function should take in two arguments: an instance of the ``TorchTrainer`` and an ``info`` dict containing other potentially
- necessary information.
- The info dict contains the following values:
- .. code-block:: python
- # The current Tune iteration.
- # This may be different than the number of epochs trained if each tune step does more than one epoch of training.
- iteration
- If you would like any other information to be available in the ``info`` dict please file a feature request on `Github Issues <https://github.com/ray-project/ray/issues>`_!
- You can see the `Tune example script <https://github.com/ray-project/ray/blob/master/python/ray/util/sgd/torch/examples/tune_example.py>`_ for an end-to-end example.
|