-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: Requesting assistance and guidance with implementation of RL algorithms and models in the context of Tetris #265
Comments
Sorry for the late response.
If possible, you can submit a PR so we can compare and review your specific code for better discussion. You're also welcome to raise more related discussion questions. Thanks for your attention. |
Thank you for the response!
I appreciate the notes and the helpful advice! I truly appreciate you taking the time to provide me with your insightful knowledge. I understand that nothing can be said through a GitHub issue and without any code, so I will submit a PR soon and ask again for your advice. I wonder, however, if MZ series algorithms are not the right direction for such an application. I am always open to more suggestions for improving my current setup. Thank you again! |
Best wishes! |
Hello, do you have any updated experimental results? If you’re interested, we can further discuss potential directions for improvement based on this. At the moment, you can try merging the latest main branch (the performance of various algorithms has significantly improved compared to before). Our goal is to integrate this environment and its efficient performance configurations (e.g., MuZero and UniZero) into LightZero in a modular and standardized manner. Looking forward to your valuable contributions, and thank you once again! |
Hello!
I'm trying to implement the many models and algorithms described in this library in the context of Tetris: specifically for multiplayer Tetris where players compete to efficiently clear lines to send as many lines as possible. Currently, I am developing a simple bot that extends beyond simply placing tetrominoes randomly.
Here's what I have:
An environment modeled after atari and game_2048 that allows models to interact and train successfully.
A Modified rewards system to incentivize more blocks place, emphasizing any lines cleared.
A config file to work with EfficientZero, with over 48 single-GPU hours trained.
Here's Some context on the environment trained on:
The observation is a 10colx8row one hot encoded board, this is stacked with some more one-hot encoded information such as the current piece, pieces in the queue, and piece held.
Each move is encoded as the coordinate of the piece place, the type of the piece placed, and the rotation of the piece, for a one-hot encoded size of 2560. The input size is 144. Currently the model uses mlp.
It is worth noting, even after many training iterations, the console still warns that a lot of illegal moves are attempted, despite the action mask being provided for the varied action space. It seems possibly the model is not able to correctly learn the legal actions.
I've also done a small amount of testing using an action space of 10 instead, and some more with reanalyzed set to 0.25 and etc.
I am always open to trying everything as I just want to get something going :).
Let me know if there is any more information or resources or context I can provide to facilitate my learning process.
Here's Some graphs from the training.
Here is the
total_config.py
file for the run:The text was updated successfully, but these errors were encountered: