You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for your great work. But i am confused about the data-collecting code.
As you mentioned:
Download policy data (positive samples) for training 1st policy model (Llama3-8b-Instruct): [Hugging Face]
Download PRM data (positive and negative samples) for training 1st reward model (Mistral-7B: MetaMATH): [Hugging Face]
how can i get these two data from your code? Is it from codes like self_train/generation/generate_both_samples_MATH.py or evaluate.py?
what is the key parameters to change to get these two data?
The text was updated successfully, but these errors were encountered:
Hi, thanks for your great work. But i am confused about the data-collecting code.
As you mentioned:
Download policy data (positive samples) for training 1st policy model (Llama3-8b-Instruct): [Hugging Face]
Download PRM data (positive and negative samples) for training 1st reward model (Mistral-7B: MetaMATH): [Hugging Face]
how can i get these two data from your code? Is it from codes like self_train/generation/generate_both_samples_MATH.py or evaluate.py?
what is the key parameters to change to get these two data?
The text was updated successfully, but these errors were encountered: