-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About Training Data in Stage 2 #23
Comments
If you have any questions, you can provide more details of the training data number for further discussion. |
Thanks for reply! |
In the original ScanQA dataset, a single question may have multiple valid answers. We generate multiple training prompts by pairing the same question with each of its different answers. |
The total number of all answers is 26,515 (already including the situation where one question has multiple answers, the questions number is 25,563), so is it possible that each single answer can be matched to multiple generated questions? |
I apologize for the error. After a careful double-check of the prompt counts for each dataset, I found that the figure in our paper doesn't accurately represent these numbers. Thanks for your reminder and we will revise the figure later, but I want to emphasize that our work utilized only the training set for all the datasets. For example, we only use 26515 prompts from ScanQA in the instruction tuning stage. |
Thank you very much for your previous replies to my questions!
I am currently generating the training data of stage 2. I found that the data number of our generated data does not match that of the paper, and I would like to ask the following questions.
The text was updated successfully, but these errors were encountered: