Max van Dijck 232c331ce3 [RLlib] Rename all np.product usage to np.prod (#46317) | 3 月之前 | |
---|---|---|
.. | ||
examples | 8d286f03ce [RLlib-contrib] Dreamer(V1) (won't be moved into `rllib_contrib`, b/c we now have DreamerV3). (#36621) | 1 年之前 |
src | 583cfe62c4 [RLlib-contrib] Leela Chess Zero. (#36627) | 1 年之前 |
tests | 583cfe62c4 [RLlib-contrib] Leela Chess Zero. (#36627) | 1 年之前 |
tuned_examples | af4c431d4d [RLlib] Remove utilities/tests/classes not needed anymore. (#40939) | 11 月之前 |
BUILD | af4c431d4d [RLlib] Remove utilities/tests/classes not needed anymore. (#40939) | 11 月之前 |
README.md | 583cfe62c4 [RLlib-contrib] Leela Chess Zero. (#36627) | 1 年之前 |
pyproject.toml | 232c331ce3 [RLlib] Rename all np.product usage to np.prod (#46317) | 3 月之前 |
requirements.txt | 232c331ce3 [RLlib] Rename all np.product usage to np.prod (#46317) | 3 月之前 |
Leela Chess Zero Leela chess zero is an algorithm made to train agents on the Leela Chess Engine. The Leela Chess Zero’s neural network is largely based on the DeepMind’s AlphaGo Zero and AlphaZero architecture. There are however some changes. It should be trained in a competition with multiple versions of its past self.
The policy/model assumes that the environment is a MultiAgent Chess environment, that has a discrete action space and returns an observation as a dictionary with two keys:
obs
that contains an observation under either the form of a state vector or an imageaction_mask
that contains a mask over the legal actionsIt should also implement a get_state
and a set_state
function, used in the MCTS implementation.
The model used in AlphaZero trainer should extend TorchModelV2
and implement the method compute_priors_and_value
.
conda create -n rllib-leela-chess python=3.10
conda activate rllib-leela-chess
pip install -r requirements.txt
pip install -e '.[development]'
[Leela Chess Zero Example]()