Protein contact map
A protein contact map represents the distance between all possible amino acid residue pairs of a three-dimensional protein structure using a binary two-dimensional matrix. For two residues and, the element of the matrix is 1 if the two residues are closer than a predetermined threshold, and 0 otherwise. Various contact definitions have been proposed: The distance between the Cα-Cα atom with threshold 6-12 Å; distance between Cβ-Cβ atoms with threshold 6-12 Å ; and distance between the side-chain centers of mass.
Overview
Contact maps provide a more reduced representation of a protein structure than its full 3D atomic coordinates. The advantage is that contact maps are invariant to rotations and translations. They are more easily predicted by machine learning methods. It has also been shown that under certain circumstances it is possible to reconstruct the 3D coordinates of a protein using its contact map.Contact maps are also used for protein superimposition and to describe similarity between protein structures. They are either predicted from protein sequence or calculated from a given structure.
Contact map prediction
With the availability of high numbers of genomic sequences it becomes feasible to analyze such sequences for coevolving residues. The effectiveness of this approach results from the fact that a mutation in position i of a protein is more likely to be associated with a mutation in position j than with a back-mutation in i if both positions are functionally coupled.Several statistical methods exist to extract from a multiple sequence alignment such coupled residue pairs: observed versus expected frequencies of residue pairs ; the McLachlan Based Substitution correlation ; statistical coupling analysis; Mutual Information based methods; and recently direct coupling analysis.
Machine learning algorithms have been able to enhance MSA analysis methods, especially for non-homologous proteins.
Predicted contact maps have been used in the prediction of membrane proteins where helix-helix interactions are targeted.
HB Plot
Knowledge of the relationship between a protein's structure and its dynamic behavior is essential for understanding protein function. The description of a protein three dimensional structure as a network of hydrogen bonding interactions was introduced as a tool for exploring protein structure and function. By analyzing the network of tertiary interactions, the possible spread of information within a protein can be investigated.HB plot offers a simple way of analyzing protein secondary structure and tertiary structure. Hydrogen bonds stabilizing secondary structural elements and those formed between distant amino acid residues - defined as tertiary hydrogen bonds - can be easily distinguished in HB plot, thus, amino acid residues involved in stabilizing protein structure and function can be identified.
Features
The plot distinguishes between main chain-main chain, main chain-side chain and side chain-side chain hydrogen bonding interactions. Bifurcated hydrogen bonds and multiple hydrogen bonds between amino acid residues; and intra- and interchain hydrogen bonds are also indicated on the plots. Three classes of hydrogen bondings are distinguished by color-coding; short, intermediate and long hydrogen bonds.Secondary structure elements in HB plot
In representations of the HB plot, characteristic patterns of secondary structure elements can be recognised easily, as follows:- Helices can be identified as strips directly adjacent to the diagonal.
- Antiparallel beta sheets appear in HB plot as cross-diagonal.
- Parallel beta sheets appears in the HB plot as parallel to the diagonal.
- Loops appear as breaks in the diagonal between the cross-diagonal beta-sheet motifs.
Examples of usage
Cytochrome P450s
The cytochrome P450s are xenobiotic-metabolizing membrane-bound heme-containing enzymes that use molecular oxygen and electrons from NADPH cytochrome P450 reductase to oxidize their substrates. CYP2B4, a member of the cytochrome P450 family is the only protein within this family, whose X-ray structure in both open 11 and closed form 12 is published. The comparison of the open and closed structures of CYP2B4 structures reveals large-scale conformational rearrangement between the two states, with the greatest conformational change around the residues 215-225, which is widely open in ligand-free state and shut after ligand binding; and the region around loop C near the heme.Examining the HB plot of the closed and open state of CYP2B4 revealed that the rearrangement of tertiary hydrogen bonds was in excellent agreement with the current knowledge of the cytochrome P450 catalytic cycle.
The first step in P450 catalytic cycle is identified as substrate binding. Preliminary binding of a ligand near to the entrance breaks hydrogen bonds S212-E474, S207-H172 in the open form of CYP2B4 and hydrogen bonds E218-A102, Q215-L51 are formed that fix the entrance in the closed form as the HB plot reveals.
The second step is the transfer of the first electron from NADPH via an electron transfer chain. For the electron transfer a conformational change occurs that triggers interaction of the P450 with the NADPH cytochrome P450 reductase. Breaking of hydrogen bonds between S128-N287, S128-T291, L124-N287 and forming S96-R434, A116-R434, R125-I435, D82-R400 at the NADPH cytochrome P450 reductase binding site—as seen in HB plot—transform CYP2B4 to a conformation state, where binding of NADPH cytochrome P450 reductase occurs.
In the third step, oxygen enters CYP2B4 in the closed state - the state where newly formed hydrogen bonds S176-T300, H172-S304, N167-R308 open a tunnel which is exactly the size and shape of an oxygen molecule.
Lipocalin family
The lipocalin family is a large and diverse family of proteins with functions as small hydrophobic molecule transporters. Beta-lactoglobulin is a typical member of the lipocalin family. Beta-lactoglobulin was found to have a role in the transport of hydrophobic ligands such as retinol or fatty acids. Its crystal structure were determined with different ligands and in ligand-free form as well. The crystal structures determined so far reveal that the typical lipocalin contains eight-stranded antiparallel-barrel arranged to form a conical central cavity in which the hydrophobic ligand is bound. The structure of beta-lactoglobulin reveals that the barrel-form structure with the central cavity of the protein has an "entrance" surrounded by five beta-loops with centers around 26, 35, 63, 87, and 111, which undergo a conformational change during the ligand binding and close the cavity.The overall shape of beta-lactoglobulin is characteristic of the lipocalin family. In the absence of alpha-helices, the main diagonal almost disappears and the cross-diagonals representing the beta-sheets dominate the plot. Relatively low number of tertiary hydrogen bonds can be found in the plot, with three high-density regions, one of which is connected to a loop at the residues around 63, a second is connected to the loop around 87, and a third region which is connected to the regions 26 and 35. The fifth loop around 111 is represented only one tertiary hydrogen bond in the HB plot.
In the three-dimensional structure, tertiary hydrogen bonds are formed near to the entrance, directly involved in conformational rearrangement during ligand binding; and at the bottom of the "barrel".
HB plots of the open and closed forms of beta-lactoglobulin are very similar, all unique motifs can be recognized in both forms. Difference in HB plots of open and ligand-bound form show few important individual changes in tertiary hydrogen bonding pattern. Especially, the formation of hydrogen bonds between Y20-E157 and S21-H161 in closed form might be crucial in conformational rearrangement. These hydrogen bonds lie at the bottom of the cavity, which suggests that the closure of the entrance of a lipocalin starts when a ligand reached the bottom of the cavity and broke hydrogen bonds R123-Y99, R123-T18, and V41-Q120. Lipocalins are known to have very low sequence similarity with high structural similarity. The only conserved regions are exactly the region around 20 and 160 with an unknown role.