-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closes #714 #721
base: main
Are you sure you want to change the base?
Closes #714 #721
Conversation
First pass at supporting bigbio_kb schema properly. WIP
Passes all tests
@phlobo I transferred the bug fix to the hub implementation. Please have a look. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mariosaenger some minor issues, could you take a quick look, please?
""" | ||
|
||
_HOMEPAGE = "http://www.geniaproject.org/shared-tasks/bionlp-jnlpba-shared-task-2004" | ||
|
||
_LICENSE = 'Creative Commons Attribution 3.0 Unported' | ||
_LICENSE = "CC_BY_3p0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this is correct. The data archives contain a LICENSE file with "GENIA Project License for Annotated Corpora".
The data came from the GENIA version 3.02 corpus (Kim et al., 2003). | ||
This was formed from a controlled search on MEDLINE using the MeSH terms human, blood cells and transcription factors. | ||
From this search 2,000 abstracts were selected and hand annotated according to a small taxonomy of 48 classes based on | ||
a chemical classification. Among the classes, 36 terminal classes were used to annotate the GENIA corpus. | ||
""" | ||
|
||
_HOMEPAGE = "http://www.geniaproject.org/shared-tasks/bionlp-jnlpba-shared-task-2004" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link does not work - If we can't find another one, maybe just a link to ACL Anthology?
document["passages"] = [ | ||
{ | ||
"id": next(uid), | ||
"type": "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passage type should not be empty imho. I guess it is "sentence" in this dataset?
biodatasets/my_dataset/my_dataset.py
(please use only lowercase and underscore for dataset naming)._CITATION
,_DATASETNAME
,_DESCRIPTION
,_HOMEPAGE
,_LICENSE
,_URLs
,_SUPPORTED_TASKS
,_SOURCE_VERSION
, and_BIGBIO_VERSION
variables._info()
,_split_generators()
and_generate_examples()
in dataloader script.BUILDER_CONFIGS
class attribute is a list with at least oneBigBioConfig
for the source schema and one for a bigbio schema.datasets.load_dataset
function.python -m tests.test_bigbio biodatasets/my_dataset/my_dataset.py
.