3D-Beacons: Decreasing the gap between protein sequences and structures through a federated network of protein structure data resources
Mihaly Varadi, Sreenath Nair, Ian Sillitoe, Gerardo Tauriello, Stephen Anyango, Stefan Bienert, Clemente Borges, Mandar Deshpande, Tim Green, Demis Hassabis, Andras Hatos, Tamas Hegedus, Maarten L Hekkelman, Robbie Joosten, John Jumper, Agata Laydon, Dmitry Molodenskiy, Damiano Piovesan, Edoardo Salladini, Steven L. Salzberg, Markus J Sommer, Martin Steinegger, Erzsebet Suhajda, Dmitri Svergun, Luiggi Tenorio-Ku, Silvio Tosatto, Kathryn Tunyasuvunakool, Andrew Mark Waterhouse, Augustin Žídek, Torsten Schwede, Christine Orengo, Sameer Velankar
3 August 2022; BioRxiv https://doi.org/10.1101/2022.08.01.501973
3D-Beacons is an open collaboration between providers of macromolecular structure models. The goal of this collaboration is to provide model coordinates and meta-information from all the contributing data resources in a standardized data format and on a unified platform.
Schematical overview of the 3D-Beacons infrastructure
3D-Beacons consists of a Registry, a Hub and Beacons who host Clients. The Registry is used by the 3D-Beacons Hub to look up which API endpoints are supported by the various Beacons. The Beacons provide data according to the 3D-Beacons data specifications (GitHub link). The Hub collates the data from the Beacons and expose it via Hub API endpoints.
The 3D-Beacons Registry records meta-information about all the contributing partner resources, and lists the API endpoints that they support. In other words, looking at the registry will give specific information on which API endpoints provide what data from which data resource.
The Registry is implemented as a JSON object that complies with the schema specification, which is also included in this repository.
These are available in resources
folder in the repository. To add or change a registry entry, make the relevant
changes in resources/registry.json
which should comply with schema defined in resources/schema.json
.
There is also an installable Python package in this repository which provides utilities like schema validation. Please follow the installation section below for installing the Python package.
Data providers who are interested in making their macromolecule structures available through the 3D-Beacons Network should follow the following steps:
- Contact the 3D-Beacons consortium
- Review the API specifications for sharing metadata
- Implement API endpoints or set up an instance of the 3D-Beacons Client
- Review the
resources/registry.json
file in this repository - Update the
resources/registry.json
file to include information on your data resource and your API endpoint URLs - Create a pull request for the
development
branch with your updatedresources/registry.json
file
3D-Beacons is an open consortium, and we welcome new data providers who would like to make their experimentally determined or theoretical macromolecule structures available through the 3D-Beacons Network.
To ensure that the network provides access to relevant data, we require new prospective data providers to contact us before linking their data to 3D-Beacons. Please send an email to Sameer Velankar ([email protected]) or Christine Orengo ([email protected]) to initiate discussions.
2. Review the API specifications for sharing metadata
The 3D-Beacons Network provides access to metadata regarding macromolecule structures in a unified format. This means that every data provider has to expose information in the same data format. We define the accepted data schemas in the 3D-Beacons API specification on GitHub.
Please review this specification, and identify the schemas that fit the data you would like to make accessible via
3D-Beacons. For example, if you want to make your structures discoverable based on a UniProt identifier, then the
endpoints with /uniprot/{qualifier}.json
are relevant for you.
3. Implement API endpoints or set up an instance of the 3D-Beacons Client
After reviewing the API specifications and deciding what data you will make available, and which data schema you will use, the next step is to either implement the selected API endpoints in a REST API, or to take advantage of the 3D-Beacons Client, which can be installed locally and includes a pre-packaged and ready-to-use implementation of certain API endpoints. For more information on this, please visit the 3D-Beacons Client repository.
Once your metadata is exposed via API endpoints that comply with the 3D-Beacons API specification, you should review the resources/registry.json
file in this repository. This file contains all the information needed by the 3D-Beacons Hub API for linking your API endpoints to the 3D-Beacons Network.
The registry has two main data blocks: 1.) providers
and 2.) services
.
The providers
contains information that describes your data resource. We use this information to let users know where to look for the original sources of data.
An example item in the providers
list would look like this:
{
"providerId": "alphafold",
"providerName": "AlphaFold Protein Structure Database",
"providerDescription": "AlphaFold is an AI system developed by DeepMind that predicts a protein’s 3D structure from its amino acid sequence. It regularly achieves accuracy competitive with experiment.",
"providerUrl": "https://alphafold.ebi.ac.uk/",
"baseServiceUrl": "https://alphafold.ebi.ac.uk/api/",
"devBaseServiceUrl": "https://dev.alphafold.ebi.ac.uk/api/",
"providerLogo": "https://alphafold.ebi.ac.uk/assets/img/dm-logo.png"
}
The services
contains information about what API endpoints are implemented by which data provider.
An example item in the services
list would look like this:
{
"serviceType": "summary",
"provider": "alphafold",
"accessPoint": "uniprot/summary/"
},
Together, the providers
and services
data blocks tell the 3D-Beacons Hub API that in the example above, AlphaFold DB provides access to their data by implementing the summary
API endpoint, which they serve on the URL https://alphafold.ebi.ac.uk/api/uniprot/summary/
The next step is to fork this repository (
i.e. https://github.com/3D-Beacons/3d-beacons-registry) and edit
the resources/registry.json
file by adding a new item in the providers
list and listing all the API endpoints you implemented in the services
list.
NOTE: If you don't have a production setup at this point, set same value for baseServiceUrl
as devBaseServiceUrl
.
Finally, please create a pull request so that we can merge your version of the resources/registry.json
file to our development
branch. We will then test the updated file, and also test all the API endpoints you specified in the services
list of the resources/registry.json
file.
As part of testing the API endpoints, we will perform stress testing of all the API endpoints you provide. We will also validate the data format against the 3D-Beacons API specification, and test if the 3D-Beacons Hub API can concatenate data.
Once done, we proceed to merge the updates into the master
branch, at which point your data resource will become officially linked to the 3D-Beacons Network.
Below are the list of softwares/tools for the utilities to properly run in the environment.
Python 3.7+
Note
Because Python 2.7 supports ended January 1, 2020, new projects should consider supporting Python 3 only, which is simpler than trying to support both. As a result, support for Python 2.7 in this project has been dropped.
Setup a Python virtual environment and install required packages.
$ python3 -m venv venv
$ source venv/bin/activate
Now install the project dependencies.
(venv) $ make dev_deps
Install the package
(venv) $ pip install .
The installed package can be used to validate registry.json
against the defined schema. Both the files are available
in resources
folder.
(venv) $ beacons_bio_3d validate_schema --schema_json resources/schema.json --registry_json resources/registry.json
- Sreenath Nair - Initial work - sreenathnair
- Mihaly Varadi - Initial work - mvaradi
See also the list of contributors who participated in this project.
This repository is open to contributions. Please fork the repository and send pull requests.