Avnish Narayan 684e28b380 [RLlib] RLlib deprecation Notices Part 1 (algorithm/, evaluation/, execution/, models/jax/) (#36826) 1 year ago
..
tests 8e680c483c [RLlib] gymnasium support (new `Env.reset()/step()/seed()/render()` APIs). (#28369) 1 year ago
README.md 68a9a33386 [RLlib] Retry agents -> algorithms. with proper doc changes this time. (#24797) 2 years ago
__init__.py 8a9a176a24 [RLlib] Remove all default config objects and rllib/agents (#33242) 1 year ago
mbmpo.py 684e28b380 [RLlib] RLlib deprecation Notices Part 1 (algorithm/, evaluation/, execution/, models/jax/) (#36826) 1 year ago
mbmpo_torch_policy.py 827ab91741 [RLlib] Replace remaining mentions of "trainer" by "algorithm". (#36557) 1 year ago
model_ensemble.py 8e680c483c [RLlib] gymnasium support (new `Env.reset()/step()/seed()/render()` APIs). (#28369) 1 year ago
utils.py 7f431d7053 Bump black (and therefore click) versions (#29574) 2 years ago

README.md

Model-based Meta-Policy Optimization (MB-MPO)

Code in this package is adapted from https://github.com/jonasrothfuss/model_ensemble_meta_learning.

Overview

MBMPO is an on-policy model-based algorithm. On a high level, MBMPO is model-based MAML. On top of MAML, MBMPO learns an ensemble of dynamics models. MBMPO trains the dynamics models with real-life data and the actor/critic networks with fake data generated by the dynamics models. The actor and critic are updated via the MAML algorithm. For the distributed execution plan, MBMPO alternates between training the dynanmics model and training the actor and critic network.

More details can be found here.

Documentation & Implementation:

MBMPO.

Detailed Documentation

Implementation