code for generate data for policy and PRM? #19

Xccanxin · 2024-12-31T03:52:48Z

Hi, thanks for your great work. But i am confused about the data-collecting code.
As you mentioned:

Download policy data (positive samples) for training 1st policy model (Llama3-8b-Instruct): [Hugging Face]
Download PRM data (positive and negative samples) for training 1st reward model (Mistral-7B: MetaMATH): [Hugging Face]

how can i get these two data from your code? Is it from codes like self_train/generation/generate_both_samples_MATH.py or evaluate.py?
what is the key parameters to change to get these two data?

zhangdan0602 · 2024-12-31T07:02:38Z

Thank you for your question! We have provided all links to policy datasets (see https://github.com/THUDM/ReST-MCTS#policy-data).

Xccanxin · 2024-12-31T08:23:35Z

Thank you for your question! We have provided all links to policy datasets (see https://github.com/THUDM/ReST-MCTS#policy-data).

thanks for the reply! how can i replicate these data myself by running the code rather than directly download？

zhangdan0602 added the about dataset datasets of PRM and policy model label Dec 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code for generate data for policy and PRM? #19

code for generate data for policy and PRM? #19

Xccanxin commented Dec 31, 2024

zhangdan0602 commented Dec 31, 2024 •

edited

Loading

Xccanxin commented Dec 31, 2024

code for generate data for policy and PRM? #19

code for generate data for policy and PRM? #19

Comments

Xccanxin commented Dec 31, 2024

zhangdan0602 commented Dec 31, 2024 • edited Loading

Xccanxin commented Dec 31, 2024

zhangdan0602 commented Dec 31, 2024 •

edited

Loading