Sven Mika a931076f59 [RLlib] Tf2 + eager-tracing same speed as framework=tf; Add more test coverage for tf2+tracing. (#19981) 3 年之前
..
tests a931076f59 [RLlib] Tf2 + eager-tracing same speed as framework=tf; Add more test coverage for tf2+tracing. (#19981) 3 年之前
README.md 020c9439dd [RLlib] CQL Documentation + Tests (#14531) 3 年之前
__init__.py 42cd414e5b [RLlib] New Offline RL Algorithm: CQL (based on SAC) (#13118) 3 年之前
cql.py 99a0088233 [RLlib] Unify the way we create local replay buffer for all agents (#19627) 3 年之前
cql_tf_policy.py 08c09737fa [RLlib] Fix R2D2 (torch) multi-GPU issue. (#18550) 3 年之前
cql_torch_policy.py cf21c634a3 [RLlib] Fix deprecated warning for torch_ops.py (soft-replaced by torch_utils.py). (#19982) 3 年之前

README.md

Conservative Q-Learning (CQL)

Overview

CQL is an offline RL algorithm that mitigates the overestimation of Q-values outside the dataset distribution via convservative critic estimates. CQL does this by adding a simple Q regularizer loss to the standard Belman update loss. This ensures that the critic does not output overly-optimistic Q-values and can be added on top of any off-policy Q-learning algorithm (in this case, we use SAC).

Documentation & Implementation:

Conservative Q-Learning (CQL).

Detailed Documentation

Implementation