Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code for generate data for policy and PRM? #19

Open
Xccanxin opened this issue Dec 31, 2024 · 2 comments
Open

code for generate data for policy and PRM? #19

Xccanxin opened this issue Dec 31, 2024 · 2 comments
Labels
about dataset datasets of PRM and policy model

Comments

@Xccanxin
Copy link

Hi, thanks for your great work. But i am confused about the data-collecting code.
As you mentioned:

  1. Download policy data (positive samples) for training 1st policy model (Llama3-8b-Instruct): [Hugging Face]

  2. Download PRM data (positive and negative samples) for training 1st reward model (Mistral-7B: MetaMATH): [Hugging Face]

how can i get these two data from your code? Is it from codes like self_train/generation/generate_both_samples_MATH.py or evaluate.py?
what is the key parameters to change to get these two data?

@zhangdan0602
Copy link
Collaborator

zhangdan0602 commented Dec 31, 2024

Thank you for your question! We have provided all links to policy datasets (see https://github.com/THUDM/ReST-MCTS#policy-data).

@zhangdan0602 zhangdan0602 added the about dataset datasets of PRM and policy model label Dec 31, 2024
@Xccanxin
Copy link
Author

Thank you for your question! We have provided all links to policy datasets (see https://github.com/THUDM/ReST-MCTS#policy-data).

thanks for the reply! how can i replicate these data myself by running the code rather than directly download?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
about dataset datasets of PRM and policy model
Projects
None yet
Development

No branches or pull requests

2 participants