Sven Mika eb89238b1f [RLlib] Fix/remove some CI tests and many_ppo release test. (#47686) 1 month ago
..
workloads eb89238b1f [RLlib] Fix/remove some CI tests and many_ppo release test. (#47686) 1 month ago
.gitignore 4348ecf850 Clean up release tests (#11420) 4 years ago
README.rst 6649f078e5 [Internal Observability] Move debug_state.txt to the log dir + support gcs_server debug state (#20722) 2 years ago
app_config.yaml 2e821688a3 [tune] New persistence mode cleanup: Remove the feature flag env variable (#40193) 1 year ago
app_config_np.yaml 2e821688a3 [tune] New persistence mode cleanup: Remove the feature flag env variable (#40193) 1 year ago
many_ppo.yaml 1a0989a1c0 [Release Test] Add GCE variation for core release tests [2/n] (#34337) 1 year ago
many_ppo_gce.yaml 1a0989a1c0 [Release Test] Add GCE variation for core release tests [2/n] (#34337) 1 year ago
tpl_cpu_1.yaml 1a0989a1c0 [Release Test] Add GCE variation for core release tests [2/n] (#34337) 1 year ago
tpl_cpu_1_c5.yaml 1a0989a1c0 [Release Test] Add GCE variation for core release tests [2/n] (#34337) 1 year ago
tpl_cpu_1_c5_gce.yaml 1a0989a1c0 [Release Test] Add GCE variation for core release tests [2/n] (#34337) 1 year ago
tpl_cpu_1_gce.yaml 1a0989a1c0 [Release Test] Add GCE variation for core release tests [2/n] (#34337) 1 year ago
tpl_cpu_1_large.yaml 1a0989a1c0 [Release Test] Add GCE variation for core release tests [2/n] (#34337) 1 year ago
tpl_cpu_1_large_gce.yaml 1a0989a1c0 [Release Test] Add GCE variation for core release tests [2/n] (#34337) 1 year ago
tpl_cpu_2.yaml ff4b802c73 [Release Tests] Properly use `ttl-hours` tag (#32727) 1 year ago
tpl_cpu_3.yaml ff4b802c73 [Release Tests] Properly use `ttl-hours` tag (#32727) 1 year ago
tpl_cpu_3_gce.yaml 3d335e1b22 [CI][GCE][RLlib] Add GCE variations to RLlib release tests (#34080) 1 year ago
tpl_cpu_4.yaml f11d8bda3a [Core] Revamp many_drivers release test (#43886) 7 months ago
tpl_cpu_4_gce.yaml f11d8bda3a [Core] Revamp many_drivers release test (#43886) 7 months ago

README.rst

Long Running Tests
==================

This directory contains the long-running workloads which are intended to run
forever until they fail. To set up the project you need to run

.. code-block:: bash

$ pip install anyscale
$ anyscale init

Note that all the long running test is running inside virtual environment, tensorflow_p36

Running the Workloads
---------------------
The easiest approach to running these workloads is to use the
`Releaser`_ tool to run them with the command
``python cli.py suite:run long_running_tests``. By default, this
will start a session to run each workload in the Anyscale product
and kick them off.

To run the tests manually, you can also use the `Anyscale UI `. First run ``anyscale snapshot create`` from the command line to create a project snapshot. Then from the UI, you can launch an individual session and execute the run command for each test.

You can also start the workloads using the CLI with:

.. code-block:: bash

$ anyscale start
$ anyscale run test_workload --workload= --wheel=


Doing this for each workload will start one EC2 instance per workload and will start the workloads
running (one per instance). A list of
available workload options is available in the `ray_projects/project.yaml` file.


Debugging
---------
The primary method to debug the test while it is running is to view the logs and the dashboard from the UI. After the test has failed, you can still view the stdout logs in the UI and also inspect
the logs under ``/tmp/ray/session*/logs/`` and
``/tmp/ray/session*/logs/debug_state.txt``.

.. To check up on the workloads, run either
.. ``anyscale session --name="*" execute check-load``, which
.. will print the load on each machine, or
.. ``anyscale session --name="*" execute show-output``, which
.. will print the tail of the output for each workload.

Shut Down the Workloads
-----------------------

The instances running the workloads can all be killed by running
``anyscale stop ``.

Adding a Workload
-----------------

To create a new workload, simply add a new Python file under ``workloads/`` and
add the workload in the run command in `ray-project/project.yaml`.

.. _`Releaser`: https://github.com/ray-project/releaser