Graphical models for protein structure

s have become powerful frameworks for protein structure prediction, protein–protein interaction, and free energy calculations for protein structures. Using a graphical model to represent the protein structure allows the solution of many problems including secondary structure prediction, protein-protein interactions, protein-drug interaction, and free energy calculations.
There are two main approaches to using graphical models in protein structure modeling. The first approach uses discrete variables for representing the coordinates or the dihedral angles of the protein structure. The variables are originally all continuous values and, to transform them into discrete values, a discretization process is typically applied. The second approach uses continuous variables for the coordinates or dihedral angles.

Discrete graphical models for protein structure

s, also known as undirected graphical models are common representations for this problem. Given an undirected graph G = , a set of random variables X = _{v ∈ V} indexed by V, form a Markov random field with respect to G if they satisfy the pairwise Markov property:

any two non-adjacent variables are conditionally independent given all other variables:

In the discrete model, the continuous variables are discretized into a set of favorable discrete values. If the variables of choice are dihedral angles, the discretization is typically done by mapping each value to the corresponding rotamer conformation.

Model

Let X = be the random variables representing the entire protein structure. X_b can be represented by a set of 3-d coordinates of the backbone atoms, or equivalently, by a sequence of bond lengths and dihedral angles. The probability of a particular conformation x can then be written as:
where represents any parameters used to describe this model, including sequence information, temperature etc. Frequently the backbone is assumed to be rigid with a known conformation, and the problem is then transformed to a side-chain placement problem. The structure of the graph is also encoded in. This structure shows which two variables are conditionally independent. As an example, side chain angles of two residues far apart can be independent given all other angles in the protein. To extract this structure, researchers use a distance threshold, and only a pair of residues which are within that threshold are considered connected.
Given this representation, the probability of a particular side chain conformation x_s given the backbone conformation x_b can be expressed as
where C is the set of all cliques in G, is a potential function defined over the variables, and Z is the partition function.
To completely characterize the MRF, it is necessary to define the potential function. To simplify, the cliques of a graph are usually restricted to only the cliques of size 2, which means the potential function is only defined over pairs of variables. In Goblin System, these pairwise functions are defined as
where is the energy of interaction between rotamer state p of residue and rotamer state q of residue and is the Boltzmann constant.
Using a PDB file, this model can be built over the protein structure. From this model, free energy can be calculated.

Free energy calculation: belief propagation

It has been shown that the free energy of a system is calculated as
where E is the enthalpy of the system, T the temperature and S, the entropy. Now if we associate a probability with each state of the system,, G can be rewritten as
Calculating p on discrete graphs is done by the generalized belief propagation algorithm. This algorithm calculates an approximation to the probabilities, and it is not guaranteed to converge to a final value set. However, in practice, it has been shown to converge successfully in many cases.

Continuous graphical models for protein structures

Graphical models can still be used when the variables of choice are continuous. In these cases, the probability distribution is represented as a multivariate probability distribution over continuous variables. Each family of distribution will then impose certain properties on the graphical model. Multivariate Gaussian distribution is one of the most convenient distributions in this problem. The simple form of the probability and the direct relation with the corresponding graphical model makes it a popular choice among researchers.

Gaussian graphical models of protein structures

Gaussian graphical models are multivariate probability distributions encoding a network of dependencies among variables. Let be a set of variables, such as dihedral angles, and let be the value of the probability density function at a particular value D. A multivariate Gaussian graphical model defines this probability as follows:
Where is the closed form for the partition function. The parameters of this distribution are and. is the vector of mean values of each variable, and, the inverse of the covariance matrix, also known as the precision matrix. Precision matrix contains the pairwise dependencies between the variables. A zero value in means that conditioned on the values of the other variables, the two corresponding variable are independent of each other.
To learn the graph structure as a multivariate Gaussian graphical model, we can use either L-1 regularization, or neighborhood selection algorithms. These algorithms simultaneously learn a graph structure and the edge strength of the connected nodes. An edge strength corresponds to the potential function defined on the corresponding two-node clique. We use a training set of a number of PDB structures to learn the and.
Once the model is learned, we can repeat the same step as in the discrete case, to get the density functions at each node, and use analytical form to calculate the free energy. Here, the partition function already has a closed form, so the inference, at least for the Gaussian graphical models is trivial. If the analytical form of the partition function is not available, particle filtering or expectation propagation can be used to approximate Z, and then perform the inference and calculate free energy.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...