Closes #714 #721

shamikbose · 2022-07-03T00:53:19Z

Confirm that this PR is linked to the dataset issue.
Create the dataloader script biodatasets/my_dataset/my_dataset.py (please use only lowercase and underscore for dataset naming).
Provide values for the _CITATION, _DATASETNAME, _DESCRIPTION, _HOMEPAGE, _LICENSE, _URLs, _SUPPORTED_TASKS, _SOURCE_VERSION, and _BIGBIO_VERSION variables.
Implement _info(), _split_generators() and _generate_examples() in dataloader script.
Make sure that the BUILDER_CONFIGS class attribute is a list with at least one BigBioConfig for the source schema and one for a bigbio schema.
Confirm dataloader script works with datasets.load_dataset function.
Confirm that your dataloader script passes the test suite run with python -m tests.test_bigbio biodatasets/my_dataset/my_dataset.py.
If my dataset is local, I have provided an output of the unit-tests in the PR (please copy paste). This is OPTIONAL for public datasets, as we can test these without access to the data files.

First pass at supporting bigbio_kb schema properly. WIP

Passes all tests

mariosaenger · 2024-10-28T10:19:19Z

@phlobo I transferred the bug fix to the hub implementation. Please have a look. Thanks!

phlobo

@mariosaenger some minor issues, could you take a quick look, please?

phlobo · 2024-12-10T10:25:51Z

bigbio/hub/hub_repos/jnlpba/jnlpba.py

 """

 _HOMEPAGE = "http://www.geniaproject.org/shared-tasks/bionlp-jnlpba-shared-task-2004"

-_LICENSE = 'Creative Commons Attribution 3.0 Unported'
+_LICENSE = "CC_BY_3p0"


Not sure this is correct. The data archives contain a LICENSE file with "GENIA Project License for Annotated Corpora".

phlobo · 2024-12-10T10:26:30Z

bigbio/hub/hub_repos/jnlpba/jnlpba.py

+The data came from the GENIA version 3.02 corpus (Kim et al., 2003).
+This was formed from a controlled search on MEDLINE using the MeSH terms human, blood cells and transcription factors.
+From this search 2,000 abstracts were selected and hand annotated according to a small taxonomy of 48 classes based on
+a chemical classification. Among the classes, 36 terminal classes were used to annotate the GENIA corpus.
 """

 _HOMEPAGE = "http://www.geniaproject.org/shared-tasks/bionlp-jnlpba-shared-task-2004"


The link does not work - If we can't find another one, maybe just a link to ACL Anthology?

phlobo · 2024-12-10T10:40:26Z

bigbio/hub/hub_repos/jnlpba/jnlpba.py

+        document["passages"] = [
+            {
+                "id": next(uid),
+                "type": "",


Passage type should not be empty imho. I guess it is "sentence" in this dataset?

shamikbose added 2 commits July 2, 2022 19:32

Update jnlpba.py

b4da577

First pass at supporting bigbio_kb schema properly. WIP

Update jnlpba.py

4a4ba70

Passes all tests

shamikbose requested review from hakunanatasha, jason-fries, sunnnymskang, ruisi-su, galtay, leonweber, sg-wbi and debajyotidatta as code owners July 3, 2022 00:53

mariosaenger self-assigned this Oct 26, 2024

Mario Sänger added 2 commits October 28, 2024 11:06

Merge branch 'main' into JNLPBA-Bug-Fix-PR_714

fc59b89

fix: Transfer JNLPBA bug fix to hub implementation

27eec70

mariosaenger requested a review from phlobo October 28, 2024 10:18

phlobo requested changes Dec 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closes #714 #721

Closes #714 #721

shamikbose commented Jul 3, 2022

mariosaenger commented Oct 28, 2024

phlobo left a comment

phlobo Dec 10, 2024

phlobo Dec 10, 2024

phlobo Dec 10, 2024

Closes #714 #721

Are you sure you want to change the base?

Closes #714 #721

Conversation

shamikbose commented Jul 3, 2022

mariosaenger commented Oct 28, 2024

phlobo left a comment

Choose a reason for hiding this comment

phlobo Dec 10, 2024

Choose a reason for hiding this comment

phlobo Dec 10, 2024

Choose a reason for hiding this comment

phlobo Dec 10, 2024

Choose a reason for hiding this comment