In molecular biology, MobiDB is a curated biological database designed to offer a centralized resource for annotations of intrinsic protein disorder. Protein disorder is a structural feature characterizing a large number of proteins with prominent members known as intrinsically unstructured proteins. The database features three levels of annotation: manually curated, indirect and predicted. By combining different data sources of protein disorder into a consensus annotation, MobiDB aims at givingthe best possible picture of the "disorder landscape" of a given protein of interest.
MobiDB data sources
Curated data and additional annotation
Curated data for MobiDB is obtained from DisProt database giving information and disorder annotation manually extracted from literature. In order to complement disorder annotation, MobiDB features additional annotations from external sources:
Pfam: protein domain annotations are displayed in graphical form and are link-enabled, allowing the user to visit the corresponding Pfam page for further information.
PDB: Secondary structure is extracted from the PDB whenever available, and displayed in graphical form and in 3D.
STRING: Known interactors with evidence in "database" and "experimental" are displayed in a sortable table.
Indirect sources
PDB X-ray: When a crystallographic experiment is done to try and resolve a protein's structure, there are cases where the position of certain residues can not be accurately determined. One of the possible causes of this is that the residue is part of a flexible/disordered region. For this reason missing residues in PDB experiments are considered an indication of intrinsic disorder.
PDB NMR: Deposited files of NMR experiments for protein structure resolution often contain multiple models, representing different conformations of the same protein. By calculating the differences between the positions of each model's residues, one can measure the degree in which this positions change. This change can be interpreted as a measure of how flexible or disordered a protein is. The web server automates this calculations taking as input a PDB formatted file.
Predictions
A great variety of intrinsic protein disorder predictors have been trained in the last decade. The bulk of them are trained to mimic the nature of the annotations previously described. Since MobiDB currently covers the full set of UniProt sequences, the included predictors need to be extremely fast. Ten predictors currently included enable MobiDB to provide disorder annotations for every protein, even when no curated or indirect data is available.
MobiDB consensus
In order to provide the best possible annotation for a given protein, MobiDB combines all its data sources into a consensus annotation. This annotation differs from the ones belonging to the sources themselves in that it features a third state, in addition to "structured" and "disordered": when two authoritative sources disagree, it displays the region as "ambiguous". With the currently available annotations, this conflict arises when a manually curated source annotates a certain region as disordered, and yet there is a PDB structure available for that same region.
Website
MobiDB website provides users with an interface to search by UniProt ID, protein name or free text. Following the submission, users are presented with a list of proteins each one annotated with disorder information integrated from various sources including consensus disorder prediction. MobiDB web-server exposes some RESTful endpoints allowing programmatic access to MobiDB and retrieval of different data types. Available GET routes provide access to UniProt, STRING, Pfam and disorder data in JSON format.