Max van Dijck 232c331ce3 [RLlib] Rename all np.product usage to np.prod (#46317) | 3 月之前 | |
---|---|---|
.. | ||
examples | 5e89aa4def [RLlib contrib] TD3. (#36726) | 1 年之前 |
src | 5e89aa4def [RLlib contrib] TD3. (#36726) | 1 年之前 |
tests | 5e89aa4def [RLlib contrib] TD3. (#36726) | 1 年之前 |
tuned_examples | a9ac55d4f2 [RLlib; RLlib contrib] Move `tuned_examples` into rllib_contrib and remove CI learning tests for contrib algos. (#40444) | 1 年之前 |
BUILD | a9ac55d4f2 [RLlib; RLlib contrib] Move `tuned_examples` into rllib_contrib and remove CI learning tests for contrib algos. (#40444) | 1 年之前 |
README.md | 5e89aa4def [RLlib contrib] TD3. (#36726) | 1 年之前 |
pyproject.toml | 232c331ce3 [RLlib] Rename all np.product usage to np.prod (#46317) | 3 月之前 |
requirements.txt | 232c331ce3 [RLlib] Rename all np.product usage to np.prod (#46317) | 3 月之前 |
TD3 While DDPG can achieve great performance sometimes, it is frequently brittle with respect to hyperparameters and other kinds of tuning. A common failure mode for DDPG is that the learned Q-function begins to dramatically overestimate Q-values, which then leads to the policy breaking, because it exploits the errors in the Q-function. Twin Delayed DDPG (TD3) is an algorithm that addresses this issue by introducing three critical tricks:
Trick One: Clipped Double-Q Learning. TD3 learns two Q-functions instead of one (hence “twin”), and uses the smaller of the two Q-values to form the targets in the Bellman error loss functions.
Trick Two: “Delayed” Policy Updates. TD3 updates the policy (and target networks) less frequently than the Q-function. The paper recommends one policy update for every two Q-function updates.
Trick Three: Target Policy Smoothing. TD3 adds noise to the target action, to make it harder for the policy to exploit Q-function errors by smoothing out Q along changes in action.
Together, these three tricks result in substantially improved performance over baseline DDPG.
conda create -n rllib-td3 python=3.10
conda activate rllib-td3
pip install -r requirements.txt
pip install -e '.[development]'
[TD3 Example]()