Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the synthetic data generation #14

Open
yeppp27 opened this issue Dec 8, 2024 · 3 comments
Open

About the synthetic data generation #14

yeppp27 opened this issue Dec 8, 2024 · 3 comments
Labels
about dataset datasets of PRM and policy model

Comments

@yeppp27
Copy link

yeppp27 commented Dec 8, 2024

Hi, thanks for your great work. And I observe that the synthetic data is generate though ``a simple breadth-first-search (BFS) manner" in the appendix B.1:
(i) a simple breadth-first-search (BFS) manner, obtaining a search tree Tq similar to the one of the (i)
self-training process. Subsequently, we verify the obtained answers of all leaf nodes of Tq according to a(i). The verified search trees are then used to derive data samples with target values for D .

Can you provide the code for this process?

@zhangdan0602 zhangdan0602 added the about dataset datasets of PRM and policy model label Dec 25, 2024
@zhoubiansining
Copy link
Contributor

This process is almost the same as the generation process in self_train/generation/generate_both_samples_MATH.py except that you don't need a vm or the MCTS* algorithm. This means you may simply do expansion layer by layer like bfs and randomly select some nodes to further do expansion. The verification process is identical. You can refer to self_train/generation/generate_both_samples_MATH.py for replication.

@yeppp27
Copy link
Author

yeppp27 commented Dec 28, 2024

Thanks for your response! Does the 'PVM' mode in the Monte Carlo Tree Search (MCTS) process omit the rollout phase?

@zhoubiansining
Copy link
Contributor

In fact, we retain the rollout phase to ensure accurate value estimation. But since we have a vm, we only simulate a few steps and estimate the value with the vm. You can adjust the number of rollout steps using the argument roll_forward_steps. If you want to increase efficiency, you may also remove this process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
about dataset datasets of PRM and policy model
Projects
None yet
Development

No branches or pull requests

3 participants