Max van Dijck 232c331ce3 [RLlib] Rename all np.product usage to np.prod (#46317) 3 月之前
..
examples 9a154b77bc [RLlib-contrib] MBMPO. (#36662) 1 年之前
src 9a154b77bc [RLlib-contrib] MBMPO. (#36662) 1 年之前
tests 9a154b77bc [RLlib-contrib] MBMPO. (#36662) 1 年之前
tuned_examples a9ac55d4f2 [RLlib; RLlib contrib] Move `tuned_examples` into rllib_contrib and remove CI learning tests for contrib algos. (#40444) 1 年之前
BUILD a9ac55d4f2 [RLlib; RLlib contrib] Move `tuned_examples` into rllib_contrib and remove CI learning tests for contrib algos. (#40444) 1 年之前
README.md 9a154b77bc [RLlib-contrib] MBMPO. (#36662) 1 年之前
pyproject.toml 232c331ce3 [RLlib] Rename all np.product usage to np.prod (#46317) 3 月之前
requirements.txt 232c331ce3 [RLlib] Rename all np.product usage to np.prod (#46317) 3 月之前

README.md

MBMPO (Model-Based Meta-Policy-Optimization)

MBMPO is a Dyna-styled model-based RL method that learns based on the predictions of an ensemble of transition-dynamics models. Similar to MAML, MBMPO metalearns an optimal policy by treating each dynamics model as a different task. Similar to the original paper, MBMPO is evaluated on MuJoCo, with the horizon set to 200 instead of the default 1000.

Additional statistics are logged in MBMPO. Each MBMPO iteration corresponds to multiple MAML iterations, and MAMLIter_i_DynaTrajInner_j_episode_reward_mean measures the agent’s returns across the dynamics models at iteration i of MAML and step j of inner adaptation.

Installation

conda create -n rllib-mbmpo python=3.10
conda activate rllib-mbmpo
pip install -r requirements.txt
pip install -e '.[development]'

Usage

[MBMPO Example]()