-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load complex portal UIDs into Chado and put them in the GPAD file #1166
Comments
SGD complex example: https://www.yeastgenome.org/complex/CPX-2262 Action for Kim: check the SGD GPI/GPAD file for this complex. |
@ValWood How do we associate gene IDs with complex portal IDs? Is there a mapping file? |
On the call I was wondering what SO term to use in column 5 of the GPI file (DB_Object_Type). I had a look at the GPAD/GPI 2.0 spec and it says:
So probably we can use something like protein-containing complex (GO:0032991) as the type for complexes.
SGD are still on GPAD/GPI v1.2 which doesn't need a term ID for the object type. This is the line in the SGD GPI file for that complex:
The type is just |
As a first step I've added a build step to load a file with a mapping from gene systematic IDs to Complex Portal IDs. The file is: The three tab separated columns are:
The PubMed ID is require by Chado. It's currently an empty file. |
After we have added some complexes to the mapping file and successfully loaded them into Chado, I'll change the GPI writer to include the complex details. |
GO:0032991 is what the GO db-xrefs file says. |
We should use a broader term if there is one, (to cover for protein-RNA complexes) |
are you using GO protein complex ID? if so use "protein-containing complex (GO:0032991" |
The GPI 1.2 spec allow "protein_complex" as a special case: DB_Object_Type A description of the type of the gene or gene product being annotated. This field uses Sequence Ontology labels and may correspond to one of the following: gene, protein_complex; protein; transcript; ncRNA; rRNA; tRNA; snRNA; snoRNA; or any subtype of ncRNA in the Sequence Ontology. https://geneontology.org/docs/gene-product-information-gpi-format/#db_object_type |
Yep, that's what I'm using. |
I added some fake protein complex data to my local test Chado database. So I've now implemented and tested writing the complexes to the GPI file. The complexes will start appearing in the GPI file once we have some complexes in |
https://www.ebi.ac.uk/complexportal/complex/organisms |
We do have some real data in the spreadsheet Sandra shared with us We will need to be careful mapping using gene names (Complex Portal will likely use the UniPRot gene names, and sometimes their names are inferred from S. cerevisiae and are not the official names. We probably need to use UnIProt identifiers instead in the 'real' conversion) |
Or, we can use the link from |
Where is that link? |
It didn't work for me either. I found the TSV files here: |
Annoyingly the gene IDs are UniProt IDs but we can look them up in Chado when loading. |
Sets feature names from a file of feature uniquenames and names. Refs #1166
Loads complex portal IDs and names and the mapping to genes. Genes are part_of complexes via feature_relationship. Refs pombase/pombase-chado#1166
New plan: we now download the data file from Complex Portal when it changes, then load the details into Chado. I'll check in the morning that it's all OK. We should have the complex IDs in the GPI from tomorrow.
That is handled by the load script. |
We were only storing one feature_relationship per complex when loading pombe_to_complex_id_mapping.tsv Refs #1166
The GPI has the complexes now. But I think I missed the Protein_Containing_Complex_Members field (column 9): I'll change the GPI writer to fill this in. |
Column 9: "Protein_Containing_Complex_Members" Refs pombase/pombase-chado#1166
I've done that in time for the nightly load. The example in the GPI docs shows UniProt IDs but the spec implies that any ID will do. I've used PomBase gene IDs for now. I'll change it if there's a problem. |
We should check this with the Noctua people. Perhaps we need to use UniProt IDs in field 9 ("Protein_Containing_Complex_Members") of the GPI file? |
I've commented here: |
I don't see complex portal IDs in Noctua, so I guess we need to add them to the
GPAD. GPI?v
cc @PCarme
The text was updated successfully, but these errors were encountered: