Sven Mika b10d5533be [RLlib] Issue 20920 (partial solution): contrib/MADDPG + pettingzoo coop-pong-v4 not working. (#21452) 2 years ago
..
tests ed85f59194 [RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. (#18879) 3 years ago
README.md eae7a1f433 [RLLib] Readme.md Documentation for Almost All Algorithms in rllib/agents (#13035) 3 years ago
__init__.py 4d7bd8c892 [RLlib] Implementation of "Model-based Meta Policy Optimization" (MB MPO) (#9409) 4 years ago
mbmpo.py b10d5533be [RLlib] Issue 20920 (partial solution): contrib/MADDPG + pettingzoo coop-pong-v4 not working. (#21452) 2 years ago
mbmpo_torch_policy.py f82880eda1 Revert "Revert [RLlib] POC: Deprecate `build_policy` (policy template) for torch only; PPOTorchPolicy (#20061) (#20399)" (#20417) 2 years ago
model_ensemble.py cf21c634a3 [RLlib] Fix deprecated warning for torch_ops.py (soft-replaced by torch_utils.py). (#19982) 3 years ago
utils.py ce96b03b07 [RLlib] MB-MPO cleanup (comments, docstrings, type annotations). (#11033) 4 years ago

README.md

Model-based Meta-Policy Optimization (MB-MPO)

Code in this package is adapted from https://github.com/jonasrothfuss/model_ensemble_meta_learning.

Overview

MBMPO is an on-policy model-based algorithm. On a high level, MBMPO is model-based MAML. On top of MAML, MBMPO learns an ensemble of dynamics models. MBMPO trains the dynamics models with real-life data and the actor/critic networks with fake data generated by the dynamics models. The actor and critic are updated via the MAML algorithm. For the distributed execution plan, MBMPO alternates between training the dynanmics model and training the actor and critic network.

More details can be found here.

Documentation & Implementation:

MBMPO.

Detailed Documentation

Implementation