分支: releases/1.11

1.11.1

1.13.0-startup-time

1.3

2.13

2.30.0-candidate

2.31.0_perf_metrics

2.34.0_version_update_oss_2

2.34.0_version_update_oss_3

2.36.0_perf_metrics

2.38.0_perf_metrics

2.5

230-dask

32788-optimizing-seo-performance

AmeerHajAli-patch-1

Rohan138-patch-1

actor-cancel

add-access-denied-to-retry

add-eks-link

add-ray-scalability-envelop-docs

ak/ae-396-rep

ak/dshb-asnc-tl-logs-fix

ak/dshb-perf-opt-fix

ak/grpc-ka-cfg-fix

ak/grpc-rtr-fix

ak/jb-mgr-fix

ak/log-clup

ak/mtrcs-exprt-fix

ak/rep-agnt-fix

ak/srv-aio-fix

ak/strm-bnchm-1

ak/strm-bnchm-pre

ak/strm-btch-exp

ak/trc-fix

akshay-anyscale-patch-1

allen_run_release_tests_in_different_cloud

ameer_vpn_release_test

angelinalg-patch-1

anyscalesam-add-to-gh-project

anyscalesam-patch-1

anyscalesam-patch-2

anyscalesam-patch-4

anyscalesam-remove-raydp-dependency

architkulkarni-patch-1

architkulkarni-patch-2

architkulkarni-patch-3

architkulkarni-patch-4

arpl/rllib-unpin-starlette

arrow-6-deprecation

arrow-conversion-error-210

automl

autoscaling-metrics-handle-2

aviv-gpu-buffer

backpressure

batch-inference-feature-branch

batch_push

better-pdb

betteraveragingovermetrics

bk-analytics

brent-0322

bzlisk

ca-byod-testing

can-03bis

can-04bis

can-272

can-air-new-off

can-air_example_vicuna_13b_lightning_deepspeed_finetuning

can-auto-02

can-bisect-debug

can-bisect-fix-working

can-bisect-flakiness

can-bisect-no-sanity

can-bisect-smoke-test

can-bisect-smoke-testing

can-bisect01

can-biwin

can-bs00

can-bs01-bis

can-byod-gce

can-byod-ml

can-byod-more-02

can-byod-the-rest-02

can-champaign-03

can-ci-04-bis

can-ci-bisect-03

can-ci-bisect-debugging

can-ci-errors-05

can-ci-temp

can-continuous-run

can-core-02

can-core-tests

can-coverage

can-coverage-02

can-coverage-03

can-coverage-04

can-coverage-05

can-coverage-07

can-coverage-test-01

can-coverage-test-02

can-cu118

can-da06

can-da07

can-dapi01

can-dapi02

can-datatest

can-db05

can-debug

can-dev-01

can-doc01

can-docker-01

can-docker-py11-03

can-fix-25

can-fix-agent_stress_test

can-fix-base

can-fix-bisect-02

can-fix-everything

can-fix-lightning

can-fix-segfault

can-fix-test-03

can-fix-urllib

can-jail-01

can-jailed-tests-test

can-mc08

can-mcx06bis

can-mcx07-bis

can-micro

can-microbenchmark

can-multinode

can-not01

can-p02bis

can-p04

can-perf

can-pip-freeze

can-pm01

can-probe-run

can-py11

can-py311-4

can-py312-doc

can-py9

can-pyyaml

can-relbranch

can-rllib

can-rllib-allcore

can-rup

can-shm-size

can-tbd01

can-test-runtime

can-test-runtime-02

can-test-test-test

can-test01bis

can-testing

can-text

can-try-stress-test

can-tswins

can-unbreak-master

can-update-39

can-urrlib

can-w1

can-w4

can-winda-02

can-yupyup

can12ml-gpu

change-intel-gaudi-habana-name

cherry_pick_rllib_contrib_warning_fix

chunkedclienttask

chunktaskwheels

ci-opt

ci-test-variation

ci/1.11.1-e2e-fix

ci/dependencies/docker-readme

ci/e2e-package-testfail

ci/fix-minimal-install

ci/release-1-10-tests

ci/release/install-wheels-locally

cindy/check-1248db6e99

cindy/check-1608a254ce

cindy/check-1d357206dd

cindy/check-384f46cbb8

cindy/check-49e452e66f

cindy/check-65b725528b

cindy/check-70e5e78d7a

cindy/check-895e2ec862

cindy/check-898d051e98

cindy/check-a11312b8a9

cindy/check-af3e54e16a

cindy/check-d729815c4b

cindy/check-ed05107ade

ckwtest

client-context

cn_bl_init

combine-workloads

conditional-release-reply

core/docs/log-redirect-stdout

core/fix/bump-test-client-timeout

count-operator

cpp_worker_add_ray_job_namespace

custom-wheel-2.2-dashboard-fix

custom-wheel-grpc-fix

custom-wheel-logging-issue-fixed

custom-wheel-opencensus-fix

custom-wheel-opencensus-fix-2.5.1

custom-wheel-opencensus-segfault

cw/cluster/add_custom_resources

cw/cluster/add_custom_resources_new

cw/event/FE

cw/mapping_task_progress

data-dashboard

dependabot/npm_and_yarn/dashboard/client/semver-6.3.1

dependabot/npm_and_yarn/dashboard/client/tough-cookie-4.1.3

dependabot/npm_and_yarn/python/ray/dashboard/client/micromatch-4.0.8

dependabot/npm_and_yarn/python/ray/dashboard/client/multi-27a054522e

dependabot/npm_and_yarn/python/ray/dashboard/client/multi-9f37c16f8f

dependabot/npm_and_yarn/python/ray/dashboard/client/multi-ceff1a497b

dependabot/npm_and_yarn/python/ray/dashboard/client/multi-cf87d80143

dependabot/npm_and_yarn/python/ray/dashboard/client/multi-d66d039ac5

dependabot/npm_and_yarn/python/ray/dashboard/client/rollup-2.79.2

dependabot/npm_and_yarn/python/ray/dashboard/client/webpack-5.94.0

dependabot/pip/doc/source/templates/testing/docker/04_finetuning_llms_with_deepspeed/deepspeed-0.15.1

dependabot/pip/python/aiohttp-3.8.5

dependabot/pip/python/certifi-2023.7.22

dependabot/pip/python/cryptography-41.0.0

dependabot/pip/python/flask-cors-5.0.0

dependabot/pip/python/gradio-3.34.0

dependabot/pip/python/jupyterlab-3.6.8

dependabot/pip/python/pygments-2.15.0

dependabot/pip/python/requirements/compat/keras-2.13.1

dependabot/pip/python/requirements/cryptography-43.0.1

dependabot/pip/python/requirements/data_processing/dask-complete--2023.6.0

dependabot/pip/python/requirements/data_processing/s3fs-2023.6.0

dependabot/pip/python/requirements/gradio-5.0.0

dependabot/pip/python/requirements/ml/mlflow-2.5.0

dependabot/pip/python/requirements/numexpr-2.8.5

dependabot/pip/python/requirements/pygments-2.15.0

dependabot/pip/python/restrictedpython-7.3

dependabot/pip/python/starlette-0.40.0

dependabot/pip/release/aiohttp-3.10.2

dependabot/pip/release/aiohttp-3.8.5

dependabot/pip/release/certifi-2023.7.22

dependabot/pip/release/cryptography-41.0.0

dependabot/pip/rllib_contrib/simple_q/tensorflow-2.12.1

dependabot/pip/rllib_contrib/td3/tensorflow-2.12.1

deprecate-preprocessors

detect-actor-died-early

devicegrid_rllib_collective

diffusion-template-release

distcoro

doc-style-fix

docs-theme-revamp-primary-sidebar

docs]-Change-installation-from-raydefault]-to-ray-air]

doubletree

dreamerv3

dynamicwebpaige-patch-1

eax/move-template

enable-metrics-flag

enable_connectors_rllib

external-config

feature/cache-worker-1

feature/cache-worker-2

feature/monitor-dataset-0

feature/wasm-on-ray

finetuningtemplate_2.9.0

fix-iter-tensor

fix-layout

fix-numpy-test

fix-rpc-no-reply

fix_fork

fix_job_agent

fix_pyarrowfailure_due_to_hdfs_uri_have_no_namenode_server_part

fix_rllib_cli

fix_runtime_env_update_env

formatdb

gcs-version-auth

gcs-version-client-small

gcs-version-small

grpc-streaming

gym_0_26_support

hack-relpath

hackathon_example_gallery

hide-job

image-classification-torch-example

img-for-llama-ft

improve-deploy-1

improve-deploy-2

infer-partition-type

ingest-docs-example

introduce_job_agent_1

iscore

issue-29811

jjyao/leak

jjyao/telemetry

jun-debug-wheel

khluu-patch-1

khluu/add_pytest_http

khluu/build_rllib_contrib

khluu/build_train

khluu/check_nightly_commit

khluu/check_ray_wheels_s3

khluu/check_wheels_exist

khluu/cp_210_dask

khluu/delete_old_tags

khluu/docker_tag_cleanup_backup

khluu/docker_tag_delete_commit_tags

khluu/docker_tag_lib

khluu/docker_tag_lib_delete

khluu/docker_tag_process

khluu/download_docs

khluu/flaky_test_suite

khluu/release_auto_0226

khluu/release_auto_0227_test

khluu/release_auto_0227_test2

khluu/release_auto_0227_test3

khluu/release_auto_293

khluu/release_auto_wheels

khluu/release_auto_windows

khluu/retrieve_pypi_token

khluu/script_clean_up_old_docker_tags

khluu/script_flaky_test_suite

khluu/split_flaky_tests_macos_suites

khluu/test_doc

khluu/upload_docs

khluu/wheels_pypi

khluu/windows_repro2

khluu/windows_sanity_check

labels_scheduling_8

lee1258561/compression

llama-2-13b

llm_finetuning_template_2.8.1

lmw

long-running

lonnie-0102-memprofdebug

lonnie-0103-tmpfs

lonnie-0108-memraytest

lonnie-0213-mininstalldebug

lonnie-0213-mininstallfix

lonnie-0217-winverify

lonnie-0308-nightlytags

lonnie-0421-pinbkdeps

lonnie-0502-dockerbuild

lonnie-0504-launcher

lonnie-0618-autobuildbit

lonnie-0618-nohosttype

lonnie-0618-rmautobuild

lonnie-0622-fixfmt

lonnie-0701-fixlint

lonnie-0705-awsid

lonnie-0801-globalherm

lonnie-0804-nocommitbase

lonnie-0805-pyproto

lonnie-0812-wanda

lonnie-0826-cppoptout

lonnie-0907-fixmaster

lonnie-0915-noaptup

lonnie-1003-tagslaunch

lonnie-1017-fixbuild

lonnie-1017-skiprllib

lonnie-1023-datatestdep

lonnie-1026-tmpchoice

lonnie-1114-noarrownight

lonnie-1115-inputorder

lonnie-1117-ignore

lonnie-1207-linkchk

lonnie-1209-buildfmt

lonnie-1212-buildfmt-doc

lonnie-1215-py311

lonnie-1220-sunset-trigger

lonnie-210-wheelverify

lonnie-240409-osstags

lonnie-240517-bazelup

lonnie-240517-ubuntup

lonnie-240605-pylint

lonnie-240610-fmtup

lonnie-240615-adagfix

lonnie-240702-manualwhl

lonnie-240703-extperm

lonnie-240703-extperm-test

lonnie-240703-githubchange

lonnie-240709-winfix

lonnie-240812-pinexample

lonnie-240821-pyup

lonnie-241008-deps

lonnie-241018-grpciotools

lonnie-241021-pyopenssl

lonnie-264-initfix

lonnie-270711-nogloo

lonnie-281-init

lonnie-verify

lonnie-work

lonnie-x

lonnie-x2

master

mb_binder

mem_sched_hack

mp_air_api_quick_fix

mp_algolia_v3

mp_book_chapter_hien

mp_collapse_api

mp_collapse_toc_poc

mp_data_imgs

mp_doc_top_update

mp_docowner_kai

mp_enable_strict_builds

mp_fix_air_notebooks

mp_fix_code

mp_fix_examples_1

mp_getting_started_termynal

mp_glossary

mp_infer_algo_env

mp_obs_md

mp_ray_assistant_fixes

mp_rllib_cli

mp_rtd_config

mp_serve_streamlit

mp_skip_undoc

mp_some_mds

mp_sphinx_upgrade

mp_streamlit_serve_2

mp_suppress_warning_plugin

mp_tawk_to_poc

mp_train_md

new-gcs-client-sync

no-cluster-env

offload-to-rpc

one-hot

oom1tbshuffle

oomcuj

oomdoc

oomflag

oomrace

oomreleaset

oomrt

oomthr

pcmoritz-patch-1

pg-commit-improvement

pg-zero

pinterest-263

pinterest/2.9.3

pinterest/hello-world

pinterest/main-2.6.3

pinterest/main-2.9.1

pray

pray2

py313

pytest-throw-on-warning

ray-1.6.0

ray-1.x

ray-2.0.0-hotfix

ray-assistant

ray-py312

ray-task-prototype

redis-cleanup

redispp

reduce-gcs-call

release-merge

releases/0.6.6

releases/0.7.0

releases/0.7.1

releases/0.7.2

releases/0.7.3

releases/0.7.4

releases/0.7.5

releases/0.7.6

releases/0.7.7

releases/0.8.0

releases/0.8.1

releases/0.8.2

releases/0.8.3

releases/0.8.4

releases/0.8.5

releases/0.8.6

releases/0.8.7

releases/1.0.0

releases/1.0.0rc0

releases/1.0.0rc1

releases/1.0.0rc2

releases/1.0.1.post1

releases/1.1.0

releases/1.10.0

releases/1.10.0rc0

releases/1.11

releases/1.11.0

releases/1.11.0-e2e

releases/1.11.0rc0

releases/1.11.0rc1

releases/1.11.1

releases/1.11.2

releases/1.12.0

releases/1.12.0rc0

releases/1.12.0rc1

releases/1.12.1

releases/1.12.2

releases/1.13.0

releases/1.13.0rc0

releases/1.13.0rc1

releases/1.2.0

releases/1.3.0

releases/1.4.0

releases/1.4.0rc0

releases/1.4.0rc1

releases/1.4.0rc2

releases/1.4.1

releases/1.5.0

releases/1.5.0rc0

releases/1.5.0rc1

releases/1.5.1

releases/1.5.2

releases/1.6.0

releases/1.6.0-cpp-dummy-wheel

releases/1.6.0.post1

releases/1.7.0

releases/1.7.0rc0

releases/1.7.1

releases/1.7.2

releases/1.8.0

releases/1.8.0.post1

releases/1.9.0

releases/1.9.0rc

releases/1.9.0rc0

releases/1.9.0rc1

releases/1.9.0rc2

releases/1.9.1

releases/1.9.1.post0

releases/1.9.1rc0

releases/1.9.2

releases/2.0.0

releases/2.0.0beta

releases/2.0.0rc0

releases/2.0.0rc1

releases/2.0.1

releases/2.1.0

releases/2.1.0rc0

releases/2.10.0

releases/2.10.0rn

releases/2.11.0

releases/2.12.0

releases/2.2.0

releases/2.20.0

releases/2.21.0

releases/2.22.0

releases/2.23.0

releases/2.24.0

releases/2.3.0

releases/2.3.1

releases/2.30.0

releases/2.31.0

releases/2.32.0

releases/2.32.0rc0

releases/2.33.0

releases/2.34.0

releases/2.35.0

releases/2.36.0

releases/2.36.1

releases/2.37.0

releases/2.38.0

releases/2.4.0

releases/2.5

releases/2.5.0

releases/2.5.0rc0

releases/2.5.1

releases/2.6.0

releases/2.6.1

releases/2.6.2

releases/2.6.3

releases/2.6.4

releases/2.7.0

releases/2.7.0rc0

releases/2.7.1

releases/2.7.1.artur

releases/2.7.1a

releases/2.7.2

releases/2.8.0

releases/2.8.1

releases/2.9.0

releases/2.9.1

releases/2.9.2

releases/2.9.3

releases/r

remove-metadata-task

remove-pass-statements

report-usage-v2

resources_data_1

retry-redis

revert-14917-make-docs-warnings-errors

revert-19888-actor_runtime_env_failure

revert-21605-revert-21583-tests/use-s3-rsync

revert-21612-pin-uvicorn-0.16.0

revert-22011-fix-job-cli-quotes

revert-22250-revert-22126-test_ac

revert-22297-serve-consecutive-health-check

revert-24030-revert-23906-ddppo_training_itr

revert-24150-improve_redis_connection_backoff

revert-24623-oom-score

revert-24681-correct-build-path

revert-24894-bump-master-to-3dev0

revert-25078-a2c_a3c_policy_sub_classes

revert-25333-restore_task_deploy

revert-25420-revert-25346-move_all_remaining_algos_to_algorithms

revert-25563-object_0607

revert-25924-iteration_loop_fixes

revert-26143-patch-1

revert-27010-dev_runtime_env_plugin_cpp

revert-27196-air/2.0.0-cherry-pick/hide-tensor-extension

revert-27229-ray_lightning_0728

revert-27239-dmitri/shift-into-new-doc-structure

revert-27560-gradio_integration

revert-27613-revert-agent-2

revert-27625-datasets/feat/ragged-tensors

revert-28101-revert-23246-grpc-update

revert-28917-dl_predictor_np

revert-29177-read-images

revert-32145-increase-timeout

revert-32690-dmitri/inherit-site-flags

revert-32784-ameer_fix_anyscale_version

revert-32998-datasets/cherry-pick/zip

revert-33029-ragged-to-tf

revert-33361-pin-typeguard

revert-33897-notebook-deps

revert-33945-fix_broken_cartpole_server_example

revert-34260-default_app_name

revert-34272-dreamer_v3_catalog_enhancements_04_lstm

revert-34433-pr-list-with-filter-tasks

revert-34766-revert_make_batch_bigger_gpu

revert-34804-can-no-sdk

revert-34871-get-rid-of-prefetcher

revert-34933-revert-34147-add_py38_compat_pipeline

revert-35080-revert-34782-mp_crisp

revert-35090-revert-34883-deflakey-test-advanced-9

revert-35106-revert-34393-cython-gcs-pubsub

revert-35447-revert-35320-temp-fix-leak-2

revert-35816-o11y-doc-content

revert-35950-pushdown-limit

revert-36244-fix-lint-link

revert-36617-vllm

revert-36744-return_err_string

revert-37769-fix-security-vuln

revert-37975-can-fix-test-03

revert-37979-dreamer_v3_04_02_ci_testing

revert-38113-jjyao/bug

revert-38263-cw/job/task_table/add_profile

revert-38669-tpu_native_resource

revert-40506-revert-39946-pg-zero

revert-40509-can-jobs-01

revert-41467-pr-fix-state-log-read-security

revert-41570-revert-41552-revert-41475-pr-pydantic-regression

revert-43283-revert-43112-time-streaming-exec-sched

revert-43320-fix-scheduler-cancellation

revert-43463-disable-locality

revert-43815-revert-43486-khluu/nightly_tag

revert-44212-actor-unavailable

revert-44389-revert-44315-bhuang/add_memray

revert-44562-fix_broken_new_api_stack_link

revert-45699-ref-count-bug

revert-45742-fix-vllm-example-doc

revert-46562-jjyao/fetch

revert-46591-py312-fix-setup

revert-46755-revert-46716-aslonnie-patch-1

revert-47832-ak/arw-tnsr-fix

revert-many-tasks-regression-suspciion

richardliaw-async-doc-patch

richardliaw-patch-1

richardliaw-patch-2

richardliaw-patch-3

richo/csrf-protection

richo/header-injection

richo/same-origin

rllib_v2_master

rmnmmock

round_node_and_FIFO_object

rtgp

rtoom

sacha-deprecate-ray-air-from-docs

scheduling_strategy_to_actor_info

secure_safe_pickle

secure_safe_pickle_fix

separate_mac_flaky_tests

serve-java-doc

serve/event_backend_2/2

serve/k8s-mirror

serve_rllib_tutorial_for_gymnasium

simran-2797-homepage-fix

simran-2797-patch-1

sortall

starters

state-api-mvp

streaming-generator-6

streaming-generator-last

streaming-generator-remove-busy-waiting

sw/docs-scrolling

sympy-pin-py37

task_profile_test

telemetry-frontend-2

test-11-fix-quotes

test-11-no-jobs

test-pr

test-wheels/core-worker-refactor-4

test-wheels/remove-local-part

test-wheels/remove-local-part-1

test-wheels/spill-manager-1

test-wheels/spill-manager-2

test-wheels/spill-manager-3

test-wheels/spill-manager-4

test_branch2

test_v2_stack_release_tests

test_wheels/foo_test

test_wheels/nightly-pipeline-ingest-spilling

test_wheels/train_retry

test_wheels/windows-wheels

tests-shuffle-gp3

train/refractor_lightning_trainer

ubranch

ubranch-1

ubranch-1.11.0

ubranch-1.13.1

ubranch-2.0.0

vc-autoscale-hack

vc-scheduler-revamp

vllm_example

windows-buildkite

workspace_templates_05_26

workspace_templates_2.5.0

workspace_templates_2.5.1

workspace_templates_2.6.1

workspace_templates_2.7.1

workspace_templates_2.7rc

workspace_templates_2.9.0

workspace_templates_finetune_llms

ws_template_fine_tuning_2.8.0

wuisawesome-patch-1

wuisawesome-patch-2

wuisawesome-patch-3

wuisawesome-patch-4

xw_test

yuduberwheels

Sven Mika b4790900f5 [RLlib] Sub-class `Trainer` (instead of `build_trainer()`): All remaining classes; soft-deprecate `build_trainer`. (#20725)		2 年之前
..
core	b4790900f5 [RLlib] Sub-class `Trainer` (instead of `build_trainer()`): All remaining classes; soft-deprecate `build_trainer`. (#20725)	2 年之前
doc	4e24c805ee AlphaZero and Ranked reward implementation (#6385)	4 年之前
environments	026bf01071 [RLlib] Upgrade gym version to 0.21 and deprecate pendulum-v0. (#19535)	3 年之前
examples	dd70720578 [rllib] Rename sample_batch_size => rollout_fragment_length (#7503)	4 年之前
models	28ab797cf5 [RLlib] Deprecate old classes, methods, functions, config keys (in prep for RLlib 1.0). (#10544)	4 年之前
optimizer	9a83908c46 [rllib] Deprecate policy optimizers (#8345)	4 年之前
README.md	42991d723f [RLlib] rllib/examples folder restructuring (#8250)	4 年之前
__init__.py	2e60f0d4d8 [RLlib] Move all jenkins RLlib-tests into bazel (rllib/BUILD). (#7178)	4 年之前

AlphaZero implementation for Ray/RLlib

Notes

This code implements a one-player AlphaZero agent. It includes the "ranked rewards" (R2) strategy which simulates the self-play in the two-player AlphaZero in forcing the agent to be better than its previous self. R2 is also very helpful to normalize dynamically the rewards.

The code is Pytorch based. It assumes that the environment is a gym environment, has a discrete action space and returns an observation as a dictionary with two keys:

obs that contains an observation under either the form of a state vector or an image
action_mask that contains a mask over the legal actions

It should also implement a get_stateand a set_state function.

The model used in AlphaZero trainer should extend ActorCriticModel and implement the method compute_priors_and_value.

Example on CartPole

Note that both mean and max rewards are obtained with the MCTS in exploration mode: dirichlet noise is added to priors and actions are sampled from the tree policy vectors. We will add later the display of the MCTS in exploitation mode: no dirichlet noise and actions are chosen as tree policy vectors argmax.

References

AlphaZero: https://arxiv.org/abs/1712.01815
Ranked rewards: https://arxiv.org/abs/1807.01672

README.md

AlphaZero implementation for Ray/RLlib

Notes

Example on CartPole

References