-
Notifications
You must be signed in to change notification settings - Fork 7
Quickstart
The Data Coordination Center (DCC) for each participating Common Fund Program needs to onboard with the CFDE-CC before we can accept submissions. You do not need to be funded by a CFDE award to participate, however awards are available (see Engagement Opportunities for Common Fund Programs for more information). To begin your onboarding, please email the helpdesk: [email protected].
Each person who needs to submit data on behalf of your DCC, see submitted your DCCs pending data submission, or approve a pending data submission will need to be onboarded to the Submission System.
A datapackage consists of 22 tab separated value (.tsv) files populated with interrelated metadata about the data assets owned by your DCC. Assuming you fill all of the tables, a datapackage submission will make your data searchable by concepts such as anatomical location, species, assay type, and other similar terms that are useful to researchers who are looking for new datasets. This datapackage can be created at several arbitrary levels of complexity, as many of the columns and several entire tables can be left empty and still produce a valid package. However, search-ability in the CFDE portal is highly correlated with model completeness, and as such the Coordination Center recommends making your datapackage as complete as possible. The full specification for all tables is available in the technical documentation. See the C2M2-Table-Summary for an high level description the tables.
Four of the tables required by the C2M2 can be automatically generated from the other tables. Once you have created the other tables, run our helper script to create the anatomy, assay_type, data_type, file_format tables. (ncbi_taxonomy tables and JSON schema functionality coming soon):
python build_term_tables.py
See the full build_term_tables documentation for more information.
To submit your data you will need to install the cfde-submit
tool
To avoid potential conflicts, we recommended installing cfde-submit from within a Python 3 virtual environment (more info)
To install the tool:
pip3 install cfde-submit
To use the tool, give it the path to the directory containing all 22 tables plus the required JSON schema file:
cfde-submit run PATH/TO/DIRECTORY
See the full cfde-submit documentation for more information.
Note that only users who have been onboarded as Data Submitters for a DCC will be able to successfully run the cfde-submit tool.
Our submission system runs the frictionless validator on our servers as part of the submission process. You do not need to install or run frictionless to use our tool, however if you would like to use the validator locally, you can install it using these commands:
pip install frictionless
if that command fails try:
pip install frictionless-py
frictionless validate PATH/TO/JSON_FILE_IN_DIRECTORY
Once a datapackage has been submitted you can view it in the portal to see how it will appear for portal users. See How to review your datapackage for more details.
One person at your DCC will have the role of Data Approver. This person verifies that a datapackage is acceptable to the DCC and approves it for inclusion in the next data release. Although DCCs can have any number of Reviewable Submissions, each can have only a single Approved Submission for public release. See How to approve your datapackage. A datapackage must be approved by a DCC approver before it becomes part of the public CFDE portal
If your datapackage submits, it has passed very basic error checking, but a much more extensive check happens when it is ingested into the database. If your datapackage has a problem, the error message will be included both in the email telling you submission is completed, and in the data submission system in the portal. In the portal, your error message will appear as 'Diagnostics'. For more help with your specific error message please search the left sidebar for common errors, ask questions in Discussions or email the helpdesk: [email protected] for assistance.
-
Tutorials
-
C2M2 Table Guide
-
Table Summary
- analysis_type.tsv
- anatomy.tsv
- assay_type.tsv
- biofluid.tsv
- biosample.tsv
- biosample_disease.tsv
- biosample_from_subject.tsv
- biosample_gene.tsv
- biosample_in_collection.tsv
- biosample_substance.tsv
- collection.tsv
- collection_anatomy.tsv
- collection_biofluid.tsv
- collection_compound.tsv
- collection_defined_by_project.tsv
- collection_disease.tsv
- collection_gene.tsv
- collection_in_collection.tsv
- collection_phenotype.tsv
- collection_protein.tsv
- collection_substance.tsv
- collection_taxonomy.tsv
- compound.tsv
- data_type.tsv
- dcc.tsv (formerly
primary_dcc_contact.tsv
- disease.tsv
- file.tsv
- file_describes_biosample.tsv
- file_describes_collection.tsv
- file_describes_subject.tsv
- file_format.tsv
- file_in_collection.tsv
- gene.tsv
- id_namespace.tsv
- ncbi_taxonomy.tsv
- phenotype.tsv
- phenotype_disease.tsv
- phenotype_gene.tsv
- project.tsv
- project_in_project.tsv
- protein.tsv
- protein_gene.tsv
- subject.tsv
- subject_disease.tsv
- subject_in_collection.tsv
- subject_phenotype.tsv
- subject_race.tsv
- subject_role_taxonomy.tsv
- subject_substance.tsv
- substance.tsv
- Reference Tables
-
Table Summary