分支: master

1.11.1

1.13.0-startup-time

1.3

2.13

2.30.0-candidate

2.31.0_perf_metrics

2.34.0_version_update_oss_2

2.34.0_version_update_oss_3

2.36.0_perf_metrics

2.38.0_perf_metrics

2.5

230-dask

32788-optimizing-seo-performance

AmeerHajAli-patch-1

Rohan138-patch-1

actor-cancel

add-access-denied-to-retry

add-eks-link

add-ray-scalability-envelop-docs

ak/ae-396-rep

ak/dshb-asnc-tl-logs-fix

ak/dshb-perf-opt-fix

ak/grpc-ka-cfg-fix

ak/grpc-rtr-fix

ak/jb-mgr-fix

ak/log-clup

ak/mtrcs-exprt-fix

ak/rep-agnt-fix

ak/srv-aio-fix

ak/strm-bnchm-1

ak/strm-bnchm-pre

ak/strm-btch-exp

ak/trc-fix

akshay-anyscale-patch-1

allen_run_release_tests_in_different_cloud

ameer_vpn_release_test

angelinalg-patch-1

anyscalesam-add-to-gh-project

anyscalesam-patch-1

anyscalesam-patch-2

anyscalesam-patch-4

anyscalesam-remove-raydp-dependency

architkulkarni-patch-1

architkulkarni-patch-2

architkulkarni-patch-3

architkulkarni-patch-4

arpl/rllib-unpin-starlette

arrow-6-deprecation

arrow-conversion-error-210

automl

autoscaling-metrics-handle-2

aviv-gpu-buffer

backpressure

batch-inference-feature-branch

batch_push

better-pdb

betteraveragingovermetrics

bk-analytics

brent-0322

bzlisk

ca-byod-testing

can-03bis

can-04bis

can-272

can-air-new-off

can-air_example_vicuna_13b_lightning_deepspeed_finetuning

can-auto-02

can-bisect-debug

can-bisect-fix-working

can-bisect-flakiness

can-bisect-no-sanity

can-bisect-smoke-test

can-bisect-smoke-testing

can-bisect01

can-biwin

can-bs00

can-bs01-bis

can-byod-gce

can-byod-ml

can-byod-more-02

can-byod-the-rest-02

can-champaign-03

can-ci-04-bis

can-ci-bisect-03

can-ci-bisect-debugging

can-ci-errors-05

can-ci-temp

can-continuous-run

can-core-02

can-core-tests

can-coverage

can-coverage-02

can-coverage-03

can-coverage-04

can-coverage-05

can-coverage-07

can-coverage-test-01

can-coverage-test-02

can-cu118

can-da06

can-da07

can-dapi01

can-dapi02

can-datatest

can-db05

can-debug

can-dev-01

can-doc01

can-docker-01

can-docker-py11-03

can-fix-25

can-fix-agent_stress_test

can-fix-base

can-fix-bisect-02

can-fix-everything

can-fix-lightning

can-fix-segfault

can-fix-test-03

can-fix-urllib

can-jail-01

can-jailed-tests-test

can-mc08

can-mcx06bis

can-mcx07-bis

can-micro

can-microbenchmark

can-multinode

can-not01

can-p02bis

can-p04

can-perf

can-pip-freeze

can-pm01

can-probe-run

can-py11

can-py311-4

can-py312-doc

can-py9

can-pyyaml

can-relbranch

can-rllib

can-rllib-allcore

can-rup

can-shm-size

can-tbd01

can-test-runtime

can-test-runtime-02

can-test-test-test

can-test01bis

can-testing

can-text

can-try-stress-test

can-tswins

can-unbreak-master

can-update-39

can-urrlib

can-w1

can-w4

can-winda-02

can-yupyup

can12ml-gpu

change-intel-gaudi-habana-name

cherry_pick_rllib_contrib_warning_fix

chunkedclienttask

chunktaskwheels

ci-opt

ci-test-variation

ci/1.11.1-e2e-fix

ci/dependencies/docker-readme

ci/e2e-package-testfail

ci/fix-minimal-install

ci/release-1-10-tests

ci/release/install-wheels-locally

cindy/check-1248db6e99

cindy/check-1608a254ce

cindy/check-1d357206dd

cindy/check-384f46cbb8

cindy/check-49e452e66f

cindy/check-65b725528b

cindy/check-70e5e78d7a

cindy/check-895e2ec862

cindy/check-898d051e98

cindy/check-a11312b8a9

cindy/check-af3e54e16a

cindy/check-d729815c4b

cindy/check-ed05107ade

ckwtest

client-context

cn_bl_init

combine-workloads

conditional-release-reply

core/docs/log-redirect-stdout

core/fix/bump-test-client-timeout

count-operator

cpp_worker_add_ray_job_namespace

custom-wheel-2.2-dashboard-fix

custom-wheel-grpc-fix

custom-wheel-logging-issue-fixed

custom-wheel-opencensus-fix

custom-wheel-opencensus-fix-2.5.1

custom-wheel-opencensus-segfault

cw/cluster/add_custom_resources

cw/cluster/add_custom_resources_new

cw/event/FE

cw/mapping_task_progress

data-dashboard

dependabot/npm_and_yarn/dashboard/client/semver-6.3.1

dependabot/npm_and_yarn/dashboard/client/tough-cookie-4.1.3

dependabot/npm_and_yarn/python/ray/dashboard/client/micromatch-4.0.8

dependabot/npm_and_yarn/python/ray/dashboard/client/multi-27a054522e

dependabot/npm_and_yarn/python/ray/dashboard/client/multi-9f37c16f8f

dependabot/npm_and_yarn/python/ray/dashboard/client/multi-ceff1a497b

dependabot/npm_and_yarn/python/ray/dashboard/client/multi-cf87d80143

dependabot/npm_and_yarn/python/ray/dashboard/client/multi-d66d039ac5

dependabot/npm_and_yarn/python/ray/dashboard/client/rollup-2.79.2

dependabot/npm_and_yarn/python/ray/dashboard/client/webpack-5.94.0

dependabot/pip/doc/source/templates/testing/docker/04_finetuning_llms_with_deepspeed/deepspeed-0.15.1

dependabot/pip/python/aiohttp-3.8.5

dependabot/pip/python/certifi-2023.7.22

dependabot/pip/python/cryptography-41.0.0

dependabot/pip/python/flask-cors-5.0.0

dependabot/pip/python/gradio-3.34.0

dependabot/pip/python/jupyterlab-3.6.8

dependabot/pip/python/pygments-2.15.0

dependabot/pip/python/requirements/compat/keras-2.13.1

dependabot/pip/python/requirements/cryptography-43.0.1

dependabot/pip/python/requirements/data_processing/dask-complete--2023.6.0

dependabot/pip/python/requirements/data_processing/s3fs-2023.6.0

dependabot/pip/python/requirements/gradio-5.0.0

dependabot/pip/python/requirements/ml/mlflow-2.5.0

dependabot/pip/python/requirements/numexpr-2.8.5

dependabot/pip/python/requirements/pygments-2.15.0

dependabot/pip/python/restrictedpython-7.3

dependabot/pip/python/starlette-0.40.0

dependabot/pip/release/aiohttp-3.10.2

dependabot/pip/release/aiohttp-3.8.5

dependabot/pip/release/certifi-2023.7.22

dependabot/pip/release/cryptography-41.0.0

dependabot/pip/rllib_contrib/simple_q/tensorflow-2.12.1

dependabot/pip/rllib_contrib/td3/tensorflow-2.12.1

deprecate-preprocessors

detect-actor-died-early

devicegrid_rllib_collective

diffusion-template-release

distcoro

doc-style-fix

docs-theme-revamp-primary-sidebar

docs]-Change-installation-from-raydefault]-to-ray-air]

doubletree

dreamerv3

dynamicwebpaige-patch-1

eax/move-template

enable-metrics-flag

enable_connectors_rllib

external-config

feature/cache-worker-1

feature/cache-worker-2

feature/monitor-dataset-0

feature/wasm-on-ray

finetuningtemplate_2.9.0

fix-iter-tensor

fix-layout

fix-numpy-test

fix-rpc-no-reply

fix_fork

fix_job_agent

fix_pyarrowfailure_due_to_hdfs_uri_have_no_namenode_server_part

fix_rllib_cli

fix_runtime_env_update_env

formatdb

gcs-version-auth

gcs-version-client-small

gcs-version-small

grpc-streaming

gym_0_26_support

hack-relpath

hackathon_example_gallery

hide-job

image-classification-torch-example

img-for-llama-ft

improve-deploy-1

improve-deploy-2

infer-partition-type

ingest-docs-example

introduce_job_agent_1

iscore

issue-29811

jjyao/leak

jjyao/telemetry

jun-debug-wheel

khluu-patch-1

khluu/add_pytest_http

khluu/build_rllib_contrib

khluu/build_train

khluu/check_nightly_commit

khluu/check_ray_wheels_s3

khluu/check_wheels_exist

khluu/cp_210_dask

khluu/delete_old_tags

khluu/docker_tag_cleanup_backup

khluu/docker_tag_delete_commit_tags

khluu/docker_tag_lib

khluu/docker_tag_lib_delete

khluu/docker_tag_process

khluu/download_docs

khluu/flaky_test_suite

khluu/release_auto_0226

khluu/release_auto_0227_test

khluu/release_auto_0227_test2

khluu/release_auto_0227_test3

khluu/release_auto_293

khluu/release_auto_wheels

khluu/release_auto_windows

khluu/retrieve_pypi_token

khluu/script_clean_up_old_docker_tags

khluu/script_flaky_test_suite

khluu/split_flaky_tests_macos_suites

khluu/test_doc

khluu/upload_docs

khluu/wheels_pypi

khluu/windows_repro2

khluu/windows_sanity_check

labels_scheduling_8

lee1258561/compression

llama-2-13b

llm_finetuning_template_2.8.1

lmw

long-running

lonnie-0102-memprofdebug

lonnie-0103-tmpfs

lonnie-0108-memraytest

lonnie-0213-mininstalldebug

lonnie-0213-mininstallfix

lonnie-0217-winverify

lonnie-0308-nightlytags

lonnie-0421-pinbkdeps

lonnie-0502-dockerbuild

lonnie-0504-launcher

lonnie-0618-autobuildbit

lonnie-0618-nohosttype

lonnie-0618-rmautobuild

lonnie-0622-fixfmt

lonnie-0701-fixlint

lonnie-0705-awsid

lonnie-0801-globalherm

lonnie-0804-nocommitbase

lonnie-0805-pyproto

lonnie-0812-wanda

lonnie-0826-cppoptout

lonnie-0907-fixmaster

lonnie-0915-noaptup

lonnie-1003-tagslaunch

lonnie-1017-fixbuild

lonnie-1017-skiprllib

lonnie-1023-datatestdep

lonnie-1026-tmpchoice

lonnie-1114-noarrownight

lonnie-1115-inputorder

lonnie-1117-ignore

lonnie-1207-linkchk

lonnie-1209-buildfmt

lonnie-1212-buildfmt-doc

lonnie-1215-py311

lonnie-1220-sunset-trigger

lonnie-210-wheelverify

lonnie-240409-osstags

lonnie-240517-bazelup

lonnie-240517-ubuntup

lonnie-240605-pylint

lonnie-240610-fmtup

lonnie-240615-adagfix

lonnie-240702-manualwhl

lonnie-240703-extperm

lonnie-240703-extperm-test

lonnie-240703-githubchange

lonnie-240709-winfix

lonnie-240812-pinexample

lonnie-240821-pyup

lonnie-241008-deps

lonnie-241018-grpciotools

lonnie-241021-pyopenssl

lonnie-264-initfix

lonnie-270711-nogloo

lonnie-281-init

lonnie-verify

lonnie-work

lonnie-x

lonnie-x2

master

mb_binder

mem_sched_hack

mp_air_api_quick_fix

mp_algolia_v3

mp_book_chapter_hien

mp_collapse_api

mp_collapse_toc_poc

mp_data_imgs

mp_doc_top_update

mp_docowner_kai

mp_enable_strict_builds

mp_fix_air_notebooks

mp_fix_code

mp_fix_examples_1

mp_getting_started_termynal

mp_glossary

mp_infer_algo_env

mp_obs_md

mp_ray_assistant_fixes

mp_rllib_cli

mp_rtd_config

mp_serve_streamlit

mp_skip_undoc

mp_some_mds

mp_sphinx_upgrade

mp_streamlit_serve_2

mp_suppress_warning_plugin

mp_tawk_to_poc

mp_train_md

new-gcs-client-sync

no-cluster-env

offload-to-rpc

one-hot

oom1tbshuffle

oomcuj

oomdoc

oomflag

oomrace

oomreleaset

oomrt

oomthr

pcmoritz-patch-1

pg-commit-improvement

pg-zero

pinterest-263

pinterest/2.9.3

pinterest/hello-world

pinterest/main-2.6.3

pinterest/main-2.9.1

pray

pray2

py313

pytest-throw-on-warning

ray-1.6.0

ray-1.x

ray-2.0.0-hotfix

ray-assistant

ray-py312

ray-task-prototype

redis-cleanup

redispp

reduce-gcs-call

release-merge

releases/0.6.6

releases/0.7.0

releases/0.7.1

releases/0.7.2

releases/0.7.3

releases/0.7.4

releases/0.7.5

releases/0.7.6

releases/0.7.7

releases/0.8.0

releases/0.8.1

releases/0.8.2

releases/0.8.3

releases/0.8.4

releases/0.8.5

releases/0.8.6

releases/0.8.7

releases/1.0.0

releases/1.0.0rc0

releases/1.0.0rc1

releases/1.0.0rc2

releases/1.0.1.post1

releases/1.1.0

releases/1.10.0

releases/1.10.0rc0

releases/1.11

releases/1.11.0

releases/1.11.0-e2e

releases/1.11.0rc0

releases/1.11.0rc1

releases/1.11.1

releases/1.11.2

releases/1.12.0

releases/1.12.0rc0

releases/1.12.0rc1

releases/1.12.1

releases/1.12.2

releases/1.13.0

releases/1.13.0rc0

releases/1.13.0rc1

releases/1.2.0

releases/1.3.0

releases/1.4.0

releases/1.4.0rc0

releases/1.4.0rc1

releases/1.4.0rc2

releases/1.4.1

releases/1.5.0

releases/1.5.0rc0

releases/1.5.0rc1

releases/1.5.1

releases/1.5.2

releases/1.6.0

releases/1.6.0-cpp-dummy-wheel

releases/1.6.0.post1

releases/1.7.0

releases/1.7.0rc0

releases/1.7.1

releases/1.7.2

releases/1.8.0

releases/1.8.0.post1

releases/1.9.0

releases/1.9.0rc

releases/1.9.0rc0

releases/1.9.0rc1

releases/1.9.0rc2

releases/1.9.1

releases/1.9.1.post0

releases/1.9.1rc0

releases/1.9.2

releases/2.0.0

releases/2.0.0beta

releases/2.0.0rc0

releases/2.0.0rc1

releases/2.0.1

releases/2.1.0

releases/2.1.0rc0

releases/2.10.0

releases/2.10.0rn

releases/2.11.0

releases/2.12.0

releases/2.2.0

releases/2.20.0

releases/2.21.0

releases/2.22.0

releases/2.23.0

releases/2.24.0

releases/2.3.0

releases/2.3.1

releases/2.30.0

releases/2.31.0

releases/2.32.0

releases/2.32.0rc0

releases/2.33.0

releases/2.34.0

releases/2.35.0

releases/2.36.0

releases/2.36.1

releases/2.37.0

releases/2.38.0

releases/2.4.0

releases/2.5

releases/2.5.0

releases/2.5.0rc0

releases/2.5.1

releases/2.6.0

releases/2.6.1

releases/2.6.2

releases/2.6.3

releases/2.6.4

releases/2.7.0

releases/2.7.0rc0

releases/2.7.1

releases/2.7.1.artur

releases/2.7.1a

releases/2.7.2

releases/2.8.0

releases/2.8.1

releases/2.9.0

releases/2.9.1

releases/2.9.2

releases/2.9.3

releases/r

remove-metadata-task

remove-pass-statements

report-usage-v2

resources_data_1

retry-redis

revert-14917-make-docs-warnings-errors

revert-19888-actor_runtime_env_failure

revert-21605-revert-21583-tests/use-s3-rsync

revert-21612-pin-uvicorn-0.16.0

revert-22011-fix-job-cli-quotes

revert-22250-revert-22126-test_ac

revert-22297-serve-consecutive-health-check

revert-24030-revert-23906-ddppo_training_itr

revert-24150-improve_redis_connection_backoff

revert-24623-oom-score

revert-24681-correct-build-path

revert-24894-bump-master-to-3dev0

revert-25078-a2c_a3c_policy_sub_classes

revert-25333-restore_task_deploy

revert-25420-revert-25346-move_all_remaining_algos_to_algorithms

revert-25563-object_0607

revert-25924-iteration_loop_fixes

revert-26143-patch-1

revert-27010-dev_runtime_env_plugin_cpp

revert-27196-air/2.0.0-cherry-pick/hide-tensor-extension

revert-27229-ray_lightning_0728

revert-27239-dmitri/shift-into-new-doc-structure

revert-27560-gradio_integration

revert-27613-revert-agent-2

revert-27625-datasets/feat/ragged-tensors

revert-28101-revert-23246-grpc-update

revert-28917-dl_predictor_np

revert-29177-read-images

revert-32145-increase-timeout

revert-32690-dmitri/inherit-site-flags

revert-32784-ameer_fix_anyscale_version

revert-32998-datasets/cherry-pick/zip

revert-33029-ragged-to-tf

revert-33361-pin-typeguard

revert-33897-notebook-deps

revert-33945-fix_broken_cartpole_server_example

revert-34260-default_app_name

revert-34272-dreamer_v3_catalog_enhancements_04_lstm

revert-34433-pr-list-with-filter-tasks

revert-34766-revert_make_batch_bigger_gpu

revert-34804-can-no-sdk

revert-34871-get-rid-of-prefetcher

revert-34933-revert-34147-add_py38_compat_pipeline

revert-35080-revert-34782-mp_crisp

revert-35090-revert-34883-deflakey-test-advanced-9

revert-35106-revert-34393-cython-gcs-pubsub

revert-35447-revert-35320-temp-fix-leak-2

revert-35816-o11y-doc-content

revert-35950-pushdown-limit

revert-36244-fix-lint-link

revert-36617-vllm

revert-36744-return_err_string

revert-37769-fix-security-vuln

revert-37975-can-fix-test-03

revert-37979-dreamer_v3_04_02_ci_testing

revert-38113-jjyao/bug

revert-38263-cw/job/task_table/add_profile

revert-38669-tpu_native_resource

revert-40506-revert-39946-pg-zero

revert-40509-can-jobs-01

revert-41467-pr-fix-state-log-read-security

revert-41570-revert-41552-revert-41475-pr-pydantic-regression

revert-43283-revert-43112-time-streaming-exec-sched

revert-43320-fix-scheduler-cancellation

revert-43463-disable-locality

revert-43815-revert-43486-khluu/nightly_tag

revert-44212-actor-unavailable

revert-44389-revert-44315-bhuang/add_memray

revert-44562-fix_broken_new_api_stack_link

revert-45699-ref-count-bug

revert-45742-fix-vllm-example-doc

revert-46562-jjyao/fetch

revert-46591-py312-fix-setup

revert-46755-revert-46716-aslonnie-patch-1

revert-47832-ak/arw-tnsr-fix

revert-many-tasks-regression-suspciion

richardliaw-async-doc-patch

richardliaw-patch-1

richardliaw-patch-2

richardliaw-patch-3

richo/csrf-protection

richo/header-injection

richo/same-origin

rllib_v2_master

rmnmmock

round_node_and_FIFO_object

rtgp

rtoom

sacha-deprecate-ray-air-from-docs

scheduling_strategy_to_actor_info

secure_safe_pickle

secure_safe_pickle_fix

separate_mac_flaky_tests

serve-java-doc

serve/event_backend_2/2

serve/k8s-mirror

serve_rllib_tutorial_for_gymnasium

simran-2797-homepage-fix

simran-2797-patch-1

sortall

starters

state-api-mvp

streaming-generator-6

streaming-generator-last

streaming-generator-remove-busy-waiting

sw/docs-scrolling

sympy-pin-py37

task_profile_test

telemetry-frontend-2

test-11-fix-quotes

test-11-no-jobs

test-pr

test-wheels/core-worker-refactor-4

test-wheels/remove-local-part

test-wheels/remove-local-part-1

test-wheels/spill-manager-1

test-wheels/spill-manager-2

test-wheels/spill-manager-3

test-wheels/spill-manager-4

test_branch2

test_v2_stack_release_tests

test_wheels/foo_test

test_wheels/nightly-pipeline-ingest-spilling

test_wheels/train_retry

test_wheels/windows-wheels

tests-shuffle-gp3

train/refractor_lightning_trainer

ubranch

ubranch-1

ubranch-1.11.0

ubranch-1.13.1

ubranch-2.0.0

vc-autoscale-hack

vc-scheduler-revamp

vllm_example

windows-buildkite

workspace_templates_05_26

workspace_templates_2.5.0

workspace_templates_2.5.1

workspace_templates_2.6.1

workspace_templates_2.7.1

workspace_templates_2.7rc

workspace_templates_2.9.0

workspace_templates_finetune_llms

ws_template_fine_tuning_2.8.0

wuisawesome-patch-1

wuisawesome-patch-2

wuisawesome-patch-3

wuisawesome-patch-4

xw_test

yuduberwheels

Max van Dijck 232c331ce3 [RLlib] Rename all np.product usage to np.prod (#46317)		3 月之前
..
examples	5e89aa4def [RLlib contrib] TD3. (#36726)	1 年之前
src	5e89aa4def [RLlib contrib] TD3. (#36726)	1 年之前
tests	5e89aa4def [RLlib contrib] TD3. (#36726)	1 年之前
tuned_examples	a9ac55d4f2 [RLlib; RLlib contrib] Move `tuned_examples` into rllib_contrib and remove CI learning tests for contrib algos. (#40444)	1 年之前
BUILD	a9ac55d4f2 [RLlib; RLlib contrib] Move `tuned_examples` into rllib_contrib and remove CI learning tests for contrib algos. (#40444)	1 年之前
README.md	5e89aa4def [RLlib contrib] TD3. (#36726)	1 年之前
pyproject.toml	232c331ce3 [RLlib] Rename all np.product usage to np.prod (#46317)	3 月之前
requirements.txt	232c331ce3 [RLlib] Rename all np.product usage to np.prod (#46317)	3 月之前

TD3 (Twin Delayed DDPG)

TD3 While DDPG can achieve great performance sometimes, it is frequently brittle with respect to hyperparameters and other kinds of tuning. A common failure mode for DDPG is that the learned Q-function begins to dramatically overestimate Q-values, which then leads to the policy breaking, because it exploits the errors in the Q-function. Twin Delayed DDPG (TD3) is an algorithm that addresses this issue by introducing three critical tricks:

Trick One: Clipped Double-Q Learning. TD3 learns two Q-functions instead of one (hence “twin”), and uses the smaller of the two Q-values to form the targets in the Bellman error loss functions.

Trick Two: “Delayed” Policy Updates. TD3 updates the policy (and target networks) less frequently than the Q-function. The paper recommends one policy update for every two Q-function updates.

Trick Three: Target Policy Smoothing. TD3 adds noise to the target action, to make it harder for the policy to exploit Q-function errors by smoothing out Q along changes in action.

Together, these three tricks result in substantially improved performance over baseline DDPG.

Installation

conda create -n rllib-td3 python=3.10
conda activate rllib-td3
pip install -r requirements.txt
pip install -e '.[development]'

Usage

[TD3 Example]()

README.md

TD3 (Twin Delayed DDPG)

Installation

Usage