Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #502 #503

Merged
merged 5 commits into from
Oct 25, 2024
Merged

Closes #502 #503

merged 5 commits into from
Oct 25, 2024

Conversation

nomisto
Copy link
Contributor

@nomisto nomisto commented Apr 22, 2022

Closes #502

This is a QnA dataset that supports two languages en and es, so there are two subsets containing the same questions: head_qa_en and head_qa_es. I implemented also a translation T2T config in de3f664 (translation is not the intended purpose of this dataset) with subset_id head_qa.

So there are finally 6 configs:

  • head_qa_en_source, head_qa_en_bigbio_qa
  • head_qa_es_source, head_qa_es_bigbio_qa
  • head_qa_source (Merge of head_qa_en_source and head_qa_es_source), head_qa_bigbio_t2t

However I get the following errors when running the test with f.e.:

(venv) PS C:\Users\Simon\biomedical> python -m tests.test_bigbio biodatasets/head_qa/head_qa.py --subset_id head_qa_en

...

head_qa_en_bigbio_t2t not found. Available: ['head_qa_source', 'head_qa_en_source', 'head_qa_es_source', 'head_qa_bigbio_t2t', 'head_qa_en_bigbio_qa', 'head_qa_es_bigbio_qa']

How should I proceed? There cannot be a "head_qa_en_bigbio_t2t" since t2t is not language specific.

@nomisto
Copy link
Contributor Author

nomisto commented Apr 27, 2022

I've now removed the t2t schema so that the dataset could be merged anytime. If the dataset should contain the (non-native) translation task and there is a solution to my config problem I can readd it.

@mariosaenger mariosaenger self-assigned this Oct 24, 2024
@mariosaenger mariosaenger requested a review from phlobo October 24, 2024 16:02
@mariosaenger
Copy link
Collaborator

@phlobo I revised the dataset and refactored it to align it with the HF-hub style integration. Please have a look at it.

Copy link
Collaborator

@phlobo phlobo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Very nice dataset, I just made some minor adjustments to the Readme.

@phlobo phlobo merged commit 559e236 into bigscience-workshop:main Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal to add HeadQA
3 participants