Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main' into dev-mz-multitask-ddp
Browse files Browse the repository at this point in the history
  • Loading branch information
puyuan1996 committed Dec 18, 2024
2 parents 0bd688e + 5614025 commit 69a1842
Show file tree
Hide file tree
Showing 236 changed files with 4,009 additions and 1,185 deletions.
59 changes: 35 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
[![GitHub license](https://img.shields.io/github/license/opendilab/LightZero)](https://github.com/opendilab/LightZero/blob/master/LICENSE)
[![discord badge](https://dcbadge.vercel.app/api/server/dkZS2JF56X?style=flat)](https://discord.gg/dkZS2JF56X)

Updated on 2024.08.18 LightZero-v0.1.0
Updated on 2024.12.10 LightZero-v0.1.0

English | [简体中文(Simplified Chinese)](https://github.com/opendilab/LightZero/blob/main/README.zh.md) | [Documentation](https://opendilab.github.io/LightZero) | [LightZero Paper](https://arxiv.org/abs/2310.08348) | [🔥UniZero Paper](https://arxiv.org/abs/2406.10667) | [🔥ReZero Paper](https://arxiv.org/abs/2404.16364)

Expand Down Expand Up @@ -58,25 +58,37 @@ For further details, please refer to [Features](#features), [Framework Structure

### Outline

- [Overview](#overview)
- [Outline](#outline)
- [Features](#features)
- [Framework Structure](#framework-structure)
- [Integrated Algorithms](#integrated-algorithms)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Documentation](#documentation)
- [Benchmark](#benchmark)
- [Awesome-MCTS Notes](#awesome-mcts-notes)
- [Paper Notes](#paper-notes)
- [Algo. Overview](#algo-overview)
- [Awesome-MCTS Papers](#awesome-mcts-papers)
- [Key Papers](#key-papers)
- [Other Papers](#other-papers)
- [Feedback and Contribution](#feedback-and-contribution)
- [Citation](#citation)
- [Acknowledgments](#acknowledgments)
- [License](#license)
- [LightZero](#lightzero)
- [🔍 Background](#-background)
- [🎨 Overview](#-overview)
- [Outline](#outline)
- [💥 Features](#-features)
- [🧩 Framework Structure](#-framework-structure)
- [🎁 Integrated Algorithms](#-integrated-algorithms)
- [⚙️ Installation](#️-installation)
- [Installation with Docker](#installation-with-docker)
- [🚀 Quick Start](#-quick-start)
- [📚 Documentation](#-documentation)
- [📊 Benchmark](#-benchmark)
- [📝 Awesome-MCTS Notes](#-awesome-mcts-notes)
- [Paper Notes](#paper-notes)
- [Algo. Overview](#algo-overview)
- [Awesome-MCTS Papers](#awesome-mcts-papers)
- [Key Papers](#key-papers)
- [LightZero Implemented series](#lightzero-implemented-series)
- [AlphaGo series](#alphago-series)
- [MuZero series](#muzero-series)
- [MCTS Analysis](#mcts-analysis)
- [MCTS Application](#mcts-application)
- [Other Papers](#other-papers)
- [ICML](#icml)
- [ICLR](#iclr)
- [NeurIPS](#neurips)
- [Other Conference or Journal](#other-conference-or-journal)
- [💬 Feedback and Contribution](#-feedback-and-contribution)
- [🌏 Citation](#-citation)
- [💓 Acknowledgments](#-acknowledgments)
- [🏷️ License](#️-license)

### 💥 Features

Expand Down Expand Up @@ -209,7 +221,7 @@ Train a MuZero agent to play [Pong](https://gymnasium.farama.org/environments/at
```bash
cd LightZero
python3 -u zoo/atari/config/atari_muzero_config.py
python3 -u zoo/atari/config/atari_muzero_segment_config.py
```
Train a MuZero agent to play [TicTacToe](https://en.wikipedia.org/wiki/Tic-tac-toe):
Expand All @@ -219,12 +231,11 @@ cd LightZero
python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py
```
Train a UniZero agent to play [Pong](http
g/):
Train a UniZero agent to play [Pong](https://gymnasium.farama.org/environments/atari/pong/):
```bash
cd LightZero
python3 -u zoo/atari/config/atari_unizero_config.py
python3 -u zoo/atari/config/atari_unizero_segment_config.py
```
## 📚 Documentation
Expand Down
6 changes: 3 additions & 3 deletions README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
[![Contributors](https://img.shields.io/github/contributors/opendilab/LightZero)](https://github.com/opendilab/LightZero/graphs/contributors)
[![GitHub license](https://img.shields.io/github/license/opendilab/LightZero)](https://github.com/opendilab/LightZero/blob/master/LICENSE)

最近更新于 2024.08.18 LightZero-v0.1.0
最近更新于 2024.12.10 LightZero-v0.1.0

[English](https://github.com/opendilab/LightZero/blob/main/README.md) | 简体中文 | [文档](https://opendilab.github.io/LightZero) | [LightZero 论文](https://arxiv.org/abs/2310.08348) | [🔥UniZero 论文](https://arxiv.org/abs/2406.10667) | [🔥ReZero 论文](https://arxiv.org/abs/2404.16364)

Expand Down Expand Up @@ -189,7 +189,7 @@ python3 -u zoo/classic_control/cartpole/config/cartpole_muzero_config.py

```bash
cd LightZero
python3 -u zoo/atari/config/atari_muzero_config.py
python3 -u zoo/atari/config/atari_muzero_segment_config.py
```

使用如下代码在 [TicTacToe](https://en.wikipedia.org/wiki/Tic-tac-toe) 环境上快速训练一个 MuZero 智能体:
Expand All @@ -203,7 +203,7 @@ python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py

```bash
cd LightZero
python3 -u zoo/atari/config/atari_unizero_config.py
python3 -u zoo/atari/config/atari_unizero_segment_config.py
```

## 📚 文档
Expand Down
2 changes: 1 addition & 1 deletion docs/source/tutorials/config/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ The `main_config` dictionary contains the main parameter settings for running th
- `update_per_collect`: The number of updates after each data collection.
- `batch_size`: The batch size sampled during the update.
- `optim_type`: Optimizer type.
- `lr_piecewise_constant_decay`: Whether to use piecewise constant learning rate decay.
- `piecewise_decay_lr_scheduler`: Whether to use piecewise constant learning rate decay.
- `learning_rate`: Initial learning rate.
- `num_simulations`: The number of simulations used in the MCTS algorithm.
- `reanalyze_ratio`: Reanalysis coefficient, controlling the probability of reanalysis.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/tutorials/config/config_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
- `update_per_collect`: 每次数据收集后更新的次数。
- `batch_size`: 更新时采样的批量大小。
- `optim_type`: 优化器类型。
- `lr_piecewise_constant_decay`: 是否使用分段常数学习率衰减。
- `piecewise_decay_lr_scheduler`: 是否使用分段常数学习率衰减。
- `learning_rate`: 初始学习率。
- `num_simulations`: MCTS算法中使用的模拟次数。
- `reanalyze_ratio`: 重分析系数,控制进行重分析的概率。
Expand Down
2 changes: 1 addition & 1 deletion lzero/agent/config/alphazero/gomoku_play_with_bot.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='Adam',
lr_piecewise_constant_decay=False,
piecewise_decay_lr_scheduler=False,
learning_rate=0.003,
grad_clip_value=0.5,
value_weight=1.0,
Expand Down
2 changes: 1 addition & 1 deletion lzero/agent/config/alphazero/tictactoe_play_with_bot.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='Adam',
lr_piecewise_constant_decay=False,
piecewise_decay_lr_scheduler=False,
learning_rate=0.003,
grad_clip_value=0.5,
value_weight=1.0,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='SGD',
lr_piecewise_constant_decay=True,
piecewise_decay_lr_scheduler=True,
learning_rate=0.2,
num_simulations=num_simulations,
reanalyze_ratio=reanalyze_ratio,
Expand Down
2 changes: 1 addition & 1 deletion lzero/agent/config/efficientzero/gym_cartpole_v0.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='Adam',
lr_piecewise_constant_decay=False,
piecewise_decay_lr_scheduler=False,
learning_rate=0.003,
num_simulations=num_simulations,
reanalyze_ratio=reanalyze_ratio,
Expand Down
2 changes: 1 addition & 1 deletion lzero/agent/config/efficientzero/gym_lunarlander_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='Adam',
lr_piecewise_constant_decay=False,
piecewise_decay_lr_scheduler=False,
learning_rate=0.003,
grad_clip_value=0.5,
num_simulations=num_simulations,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='SGD',
lr_piecewise_constant_decay=True,
piecewise_decay_lr_scheduler=True,
learning_rate=0.2,
num_simulations=num_simulations,
reanalyze_ratio=reanalyze_ratio,
Expand Down
2 changes: 1 addition & 1 deletion lzero/agent/config/efficientzero/gym_pendulum_v1.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='Adam',
lr_piecewise_constant_decay=False,
piecewise_decay_lr_scheduler=False,
learning_rate=0.003,
num_simulations=num_simulations,
reanalyze_ratio=reanalyze_ratio,
Expand Down
2 changes: 1 addition & 1 deletion lzero/agent/config/efficientzero/gym_pongnoframeskip_v4.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='SGD',
lr_piecewise_constant_decay=True,
piecewise_decay_lr_scheduler=True,
learning_rate=0.2,
num_simulations=num_simulations,
reanalyze_ratio=reanalyze_ratio,
Expand Down
2 changes: 1 addition & 1 deletion lzero/agent/config/gumbel_muzero/gomoku_play_with_bot.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='Adam',
lr_piecewise_constant_decay=False,
piecewise_decay_lr_scheduler=False,
learning_rate=0.003,
grad_clip_value=0.5,
num_simulations=num_simulations,
Expand Down
2 changes: 1 addition & 1 deletion lzero/agent/config/gumbel_muzero/gym_cartpole_v0.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
batch_size=batch_size,
optim_type='Adam',
max_num_considered_actions=2,
lr_piecewise_constant_decay=False,
piecewise_decay_lr_scheduler=False,
learning_rate=0.003,
ssl_loss_weight=2, # NOTE: default is 0.
num_simulations=num_simulations,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='Adam',
lr_piecewise_constant_decay=False,
piecewise_decay_lr_scheduler=False,
learning_rate=0.003,
grad_clip_value=0.5,
num_simulations=num_simulations,
Expand Down
2 changes: 1 addition & 1 deletion lzero/agent/config/muzero/gomoku_play_with_bot.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='Adam',
lr_piecewise_constant_decay=False,
piecewise_decay_lr_scheduler=False,
learning_rate=0.003,
grad_clip_value=0.5,
num_simulations=num_simulations,
Expand Down
2 changes: 1 addition & 1 deletion lzero/agent/config/muzero/gym_breakoutnoframeskip_v4.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='SGD',
lr_piecewise_constant_decay=True,
piecewise_decay_lr_scheduler=True,
learning_rate=0.2,
num_simulations=num_simulations,
reanalyze_ratio=reanalyze_ratio,
Expand Down
2 changes: 1 addition & 1 deletion lzero/agent/config/muzero/gym_cartpole_v0.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='Adam',
lr_piecewise_constant_decay=False,
piecewise_decay_lr_scheduler=False,
learning_rate=0.003,
ssl_loss_weight=2, # NOTE: default is 0.
num_simulations=num_simulations,
Expand Down
2 changes: 1 addition & 1 deletion lzero/agent/config/muzero/gym_lunarlander_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='Adam',
lr_piecewise_constant_decay=False,
piecewise_decay_lr_scheduler=False,
learning_rate=0.003,
ssl_loss_weight=2, # NOTE: default is 0.
grad_clip_value=0.5,
Expand Down
2 changes: 1 addition & 1 deletion lzero/agent/config/muzero/gym_mspacmannoframeskip_v4.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='SGD',
lr_piecewise_constant_decay=True,
piecewise_decay_lr_scheduler=True,
learning_rate=0.2,
num_simulations=num_simulations,
reanalyze_ratio=reanalyze_ratio,
Expand Down
2 changes: 1 addition & 1 deletion lzero/agent/config/muzero/gym_pendulum_v1.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='Adam',
lr_piecewise_constant_decay=False,
piecewise_decay_lr_scheduler=False,
learning_rate=0.003,
ssl_loss_weight=2, # NOTE: default is 0.
num_simulations=num_simulations,
Expand Down
2 changes: 1 addition & 1 deletion lzero/agent/config/muzero/gym_pongnoframeskip_v4.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='SGD',
lr_piecewise_constant_decay=True,
piecewise_decay_lr_scheduler=True,
learning_rate=0.2,
num_simulations=num_simulations,
reanalyze_ratio=reanalyze_ratio,
Expand Down
2 changes: 1 addition & 1 deletion lzero/agent/config/muzero/tictactoe_play_with_bot.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='Adam',
lr_piecewise_constant_decay=False,
piecewise_decay_lr_scheduler=False,
learning_rate=0.003,
grad_clip_value=0.5,
num_simulations=num_simulations,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='Adam',
lr_piecewise_constant_decay=False,
piecewise_decay_lr_scheduler=False,
learning_rate=0.003,
value_weight=1.0,
entropy_weight=0.0,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='Adam',
lr_piecewise_constant_decay=False,
piecewise_decay_lr_scheduler=False,
learning_rate=0.003,
grad_clip_value=0.5,
value_weight=1.0,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='SGD',
lr_piecewise_constant_decay=True,
piecewise_decay_lr_scheduler=True,
learning_rate=0.2,
num_simulations=num_simulations,
reanalyze_ratio=reanalyze_ratio,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='Adam',
lr_piecewise_constant_decay=False,
piecewise_decay_lr_scheduler=False,
learning_rate=0.003,
num_simulations=num_simulations,
reanalyze_ratio=reanalyze_ratio,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,14 +50,14 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='Adam',
lr_piecewise_constant_decay=False,
piecewise_decay_lr_scheduler=False,
learning_rate=0.003,
grad_clip_value=0.5,
num_simulations=num_simulations,
reanalyze_ratio=reanalyze_ratio,
random_collect_episode_num=0,
# NOTE: for continuous gaussian policy, we use the policy_entropy_loss as in the original Sampled MuZero paper.
policy_entropy_loss_weight=5e-3,
policy_entropy_weight=5e-3,
n_episode=n_episode,
eval_freq=int(1e3),
replay_buffer_size=int(1e6), # the size/capacity of replay_buffer, in the terms of transitions.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='SGD',
lr_piecewise_constant_decay=True,
piecewise_decay_lr_scheduler=True,
learning_rate=0.2,
num_simulations=num_simulations,
reanalyze_ratio=reanalyze_ratio,
Expand Down
4 changes: 2 additions & 2 deletions lzero/agent/config/sampled_efficientzero/gym_pendulum_v1.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,10 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='Adam',
lr_piecewise_constant_decay=False,
piecewise_decay_lr_scheduler=False,
learning_rate=0.003,
# NOTE: for continuous gaussian policy, we use the policy_entropy_loss as in the original Sampled MuZero paper.
policy_entropy_loss_weight=5e-3,
policy_entropy_weight=5e-3,
num_simulations=num_simulations,
reanalyze_ratio=reanalyze_ratio,
n_episode=n_episode,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@
update_per_collect=update_per_collect,
batch_size=batch_size,
optim_type='SGD',
lr_piecewise_constant_decay=True,
piecewise_decay_lr_scheduler=True,
learning_rate=0.2,
num_simulations=num_simulations,
reanalyze_ratio=reanalyze_ratio,
Expand Down
2 changes: 1 addition & 1 deletion lzero/config/meta.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
__TITLE__ = "LightZero"

#: Version of this project.
__VERSION__ = "0.0.3"
__VERSION__ = "0.1.0"

#: Short description of the project, will be included in ``setup.py``.
__DESCRIPTION__ = 'A lightweight and efficient MCTS/AlphaZero/MuZero algorithm toolkits.'
Expand Down
Loading

0 comments on commit 69a1842

Please sign in to comment.