Max van Dijck 232c331ce3 [RLlib] Rename all np.product usage to np.prod (#46317) 3 月之前
..
examples 6a81415c6d [RLlib_contrib] DDPPO. (#36620) 1 年之前
src 6a81415c6d [RLlib_contrib] DDPPO. (#36620) 1 年之前
tests 6a81415c6d [RLlib_contrib] DDPPO. (#36620) 1 年之前
tuned_examples a9ac55d4f2 [RLlib; RLlib contrib] Move `tuned_examples` into rllib_contrib and remove CI learning tests for contrib algos. (#40444) 1 年之前
BUILD a9ac55d4f2 [RLlib; RLlib contrib] Move `tuned_examples` into rllib_contrib and remove CI learning tests for contrib algos. (#40444) 1 年之前
README.md 6a81415c6d [RLlib_contrib] DDPPO. (#36620) 1 年之前
pyproject.toml 232c331ce3 [RLlib] Rename all np.product usage to np.prod (#46317) 3 月之前
requirements.txt 232c331ce3 [RLlib] Rename all np.product usage to np.prod (#46317) 3 月之前

README.md

DDPPO (Decentralized Distributed Proximal Policy Optimization)

DDPPO is a method for distributed reinforcement learning in resource-intensive simulated environments based on PPO. DD-PPO is distributed (uses multiple machines), decentralized (lacks a centralized server), and synchronous (no computation is ever stale), making it conceptually simple and easy to implement.

Installation

conda create -n rllib-ddppo python=3.10
conda activate rllib-ddppo
pip install -r requirements.txt
pip install -e '.[development]'

Usage

[DDPPO Example]()