Sven Mika d5bfb7b7da [RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652) 2 年之前
..
tests d5bfb7b7da [RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652) 2 年之前
README.md 4bcd475671 [RLlib] Improved Documentation for PPO, DDPG, and SAC (#12943) 3 年之前
__init__.py fba8461663 [RLlib] Add RNN-SAC agent (#16577) 3 年之前
rnnsac.py b10d5533be [RLlib] Issue 20920 (partial solution): contrib/MADDPG + pettingzoo coop-pong-v4 not working. (#21452) 2 年之前
rnnsac_torch_model.py fba8461663 [RLlib] Add RNN-SAC agent (#16577) 3 年之前
rnnsac_torch_policy.py 2317c693cf [RLlib] Use SampleBrach instead of input dict whenever possible (#20746) 2 年之前
sac.py d5bfb7b7da [RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652) 2 年之前
sac_tf_model.py 2317c693cf [RLlib] Use SampleBrach instead of input dict whenever possible (#20746) 2 年之前
sac_tf_policy.py 2317c693cf [RLlib] Use SampleBrach instead of input dict whenever possible (#20746) 2 年之前
sac_torch_model.py 2317c693cf [RLlib] Use SampleBrach instead of input dict whenever possible (#20746) 2 年之前
sac_torch_policy.py 2317c693cf [RLlib] Use SampleBrach instead of input dict whenever possible (#20746) 2 年之前

README.md

Soft Actor Critic (SAC)

Overview

SAC is a SOTA model-free off-policy RL algorithm that performs remarkably well on continuous-control domains. SAC employs an actor-critic framework and combats high sample complexity and training stability via learning based on a maximum-entropy framework. Unlike the standard RL objective which aims to maximize sum of reward into the future, SAC seeks to optimize sum of rewards as well as expected entropy over the current policy. In addition to optimizing over an actor and critic with entropy-based objectives, SAC also optimizes for the entropy coeffcient.

Documentation & Implementation:

Soft Actor-Critic Algorithm (SAC) with also discrete-action support.

Detailed Documentation

Implementation