-
Notifications
You must be signed in to change notification settings - Fork 7
Quickstart
The Data Coordination Center (DCC) for each participating Common Fund Program needs to onboard with the CFDE-CC before we can accept submissions. You do not need to be funded by a CFDE award to participate, but awards are available (see Engagement Opportunities for Common Fund Programs for more information). To begin your onboarding, please email the helpdesk: [email protected].
Anyone who will need permissions to submit, review and/or approve data on behalf of your DCC will need to be onboarded to the Submission System.
A C2M2 datapackage consists of a group of tab-separated value (.tsv) files populated with interrelated metadata about the data assets owned by your DCC. Submitting a datapackage to the CFDE portal system can make your data searchable by concepts like anatomical location, species, assay type, disease, phenotype, chemical substance and other terms relevant to biomedical researchers looking for new datasets. This datapackage can be created with arbitrary levels of complexity: many of the columns and several entire tables can be left empty and still produce a valid package; selection depends on what each DCC wants to express about their own sets of research data. Findability of research data in the CFDE portal will correlate with datapackage completeness, so once a DCC has identified a relevant selection of model components, it is best to make datapackages as rich as possible within those selected components. The full specification for all tables is available in the technical documentation. See the C2M2-Table-Summary for a high-level description.
The controlled vocabulary (CV) term tables required for C2M2 submissions will be automatically generated from term usage collected from the other tables. Once you have created all other (non-CV) tables, run our submission prep script (wiki page; code) to automatically build the anatomy.tsv, analysis_type.tsv, assay_type.tsv, data_type.tsv, file_format.tsv, ncbi_taxonomy.tsv, compound.tsv, substance.tsv, phenotype.tsv, gene.tsv, disease.tsv, phenotype_gene.tsv, and phenotype_disease.tsv tables to be included with your submission:
python prepare_c2m2_submission.py
See the wiki page for usage information.
To submit your data you will need to install the cfde-submit
tool
To avoid potential conflicts, we recommended installing cfde-submit from within a Python 3 virtual environment (more info)
To install the tool:
pip3 install cfde-submit
To use the tool, give it the path to the directory containing all 22 tables plus the required JSON schema file:
cfde-submit run PATH/TO/DIRECTORY
See the full cfde-submit documentation for more information.
Note that only users who have been onboarded as Data Submitters for a DCC will be able to successfully run the cfde-submit tool.
Our submission system runs the frictionless validator on our servers as part of the submission process. You do not need to install or run frictionless to use our tool, however if you would like to use the validator locally, you can install it using these commands:
pip install frictionless
if that command fails try:
pip install frictionless-py
Once it's installed, run it by doing:
frictionless validate PATH/TO/JSON_FILE_IN_DIRECTORY
This command takes several minutes to run, and dumps the results into your terminal by default. To make a nicer file to review do:
frictionless validate PATH/TO/JSON_FILE_IN_DIRECTORY > report.txt
Once a datapackage has been submitted you can view it in the portal to see how it will appear for portal users. See How to review your datapackage for more details.
One person at your DCC will have the role of Data Approver. This person verifies that a datapackage is acceptable to the DCC and approves it for inclusion in the next data release. Although DCCs can have any number of Reviewable Submissions, each can have only a single Approved Submission in each public release. See How to approve your datapackage. A datapackage must be approved by a DCC approver before it becomes part of the public CFDE portal
If your datapackage submits, it has passed very basic error checking, but a much more extensive check happens when it is ingested into the database. If your datapackage has a problem, the error message will be included both in the email telling you submission is completed, and in the data submission system in the portal. In the portal, your error message will appear as 'Diagnostics'. For more help with your specific error message please search the left sidebar for common errors, ask questions in Discussions or email the helpdesk: [email protected] for assistance.
-
Tutorials
-
C2M2 Table Guide
-
Table Summary
- analysis_type.tsv
- anatomy.tsv
- assay_type.tsv
- biofluid.tsv
- biosample.tsv
- biosample_disease.tsv
- biosample_from_subject.tsv
- biosample_gene.tsv
- biosample_in_collection.tsv
- biosample_substance.tsv
- collection.tsv
- collection_anatomy.tsv
- collection_biofluid.tsv
- collection_compound.tsv
- collection_defined_by_project.tsv
- collection_disease.tsv
- collection_gene.tsv
- collection_in_collection.tsv
- collection_phenotype.tsv
- collection_protein.tsv
- collection_substance.tsv
- collection_taxonomy.tsv
- compound.tsv
- data_type.tsv
- dcc.tsv (formerly
primary_dcc_contact.tsv
- disease.tsv
- file.tsv
- file_describes_biosample.tsv
- file_describes_collection.tsv
- file_describes_subject.tsv
- file_format.tsv
- file_in_collection.tsv
- gene.tsv
- id_namespace.tsv
- ncbi_taxonomy.tsv
- phenotype.tsv
- phenotype_disease.tsv
- phenotype_gene.tsv
- project.tsv
- project_in_project.tsv
- protein.tsv
- protein_gene.tsv
- subject.tsv
- subject_disease.tsv
- subject_in_collection.tsv
- subject_phenotype.tsv
- subject_race.tsv
- subject_role_taxonomy.tsv
- subject_substance.tsv
- substance.tsv
- Reference Tables
-
Table Summary