v0.0.5
Environment
Algorithm
- Gumbel AlphaZero in ctree (#212)
Enhancement
- add eval_offline option (#188)
- save the updated searched policy and value to the buffer during reanalyze (#190)
- add muzero visualization (#181)
- add efficientzero tictactoe configs (#204)
- add 2 mcts related iclr2024 papers
- add load pretrained model option in test_game_segment (#194)
- polish _forward_learn() and some data process operations (#191)
Fix
- fix sync_gradients and log in DDP settings (#200)
- fix channel_last bug
- fix total_episode_count bug in collector
- fix memory_lightzero_env return bug
- fix obs_max_scale bug in memory_env
Style
- add ZeroPal and discord link (#209)
- add unittest for game_buffer_muzero (#186)
- add customization documentation section in readme
Full Changelog: v0.0.4...v0.0.5
Contributors: @karroyan @HarryXuancy @nighood @puyuan1996