Max van Dijck 232c331ce3 [RLlib] Rename all np.product usage to np.prod (#46317) 3 月之前
..
examples 63fc6c9cf9 [rllib-contrib] add a2c to rllib contrib (#36516) 1 年之前
src 63fc6c9cf9 [rllib-contrib] add a2c to rllib contrib (#36516) 1 年之前
tests 63fc6c9cf9 [rllib-contrib] add a2c to rllib contrib (#36516) 1 年之前
tuned_examples a9ac55d4f2 [RLlib; RLlib contrib] Move `tuned_examples` into rllib_contrib and remove CI learning tests for contrib algos. (#40444) 1 年之前
BUILD a9ac55d4f2 [RLlib; RLlib contrib] Move `tuned_examples` into rllib_contrib and remove CI learning tests for contrib algos. (#40444) 1 年之前
README.md 63fc6c9cf9 [rllib-contrib] add a2c to rllib contrib (#36516) 1 年之前
pyproject.toml 232c331ce3 [RLlib] Rename all np.product usage to np.prod (#46317) 3 月之前
requirements.txt 232c331ce3 [RLlib] Rename all np.product usage to np.prod (#46317) 3 月之前

README.md

A2C (Advantage Actor-Critic)

A2C is a synchronous, deterministic version of A3C; that’s why it is named as “A2C” with the first “A” (“asynchronous”) removed. In A3C each agent talks to the global parameters independently, so it is possible sometimes the thread-specific agents would be playing with policies of different versions and therefore the aggregated update would not be optimal. To resolve the inconsistency, a coordinator in A2C waits for all the parallel actors to finish their work before updating the global parameters and then in the next iteration parallel actors starts from the same policy. The synchronized gradient update keeps the training more cohesive and potentially to make convergence faster.

Installation

conda create -n rllib-a2c python=3.10
conda activate rllib-a2c
pip install -r requirements.txt
pip install -e '.[development]'

Usage

A3C Example