In bioinformatics, the root-mean-square deviation of atomic positions is the measure of the average distance between the atoms of superimposedproteins. Note that RMSD calculation can be applied to other, non-protein molecules, such as small organic molecules. In the study of globular protein conformations, one customarily measures the similarity in three-dimensional structure by the RMSD of the Cα atomic coordinates after optimal rigid bodysuperposition. When a dynamical system fluctuates about some well-defined average position, the RMSD from the average over time can be referred to as the RMSF or root mean square fluctuation. The size of this fluctuation can be measured, for example using Mössbauer spectroscopy or nuclear magnetic resonance, and can provide important physical information. The Lindemann index is a method of placing the RMSF in the context of the parameters of the system. A widely used way to compare the structures of biomolecules or solid bodies is to translate and rotate one structure with respect to the other to minimize the RMSD. Coutsias, et al. presented a simple derivation, based on quaternions, for the optimal solid body transformation that minimizes the RMSD between two sets of vectors. They proved that the quaternion method is equivalent to the well-known Kabsch algorithm. The solution given by Kabsch is an instance of the solution of the d-dimensional problem, introduced by Hurley and Cattell. The quaternion solution to compute the optimal rotation was published in the appendix of a paper of Petitjean. This quaternion solution and the calculation of the optimal isometry in the d-dimensional case were both extended to infinite sets and to the continuous case in the appendix A of another paper of Petitjean.
The equation
where δi is the distance between atom i and either a reference structure or the mean position of the N equivalent atoms. This is often calculated for the backbone heavy atoms C, N, O, and Cα or sometimes just the Cα atoms. Normally a rigid superposition which minimizes the RMSD is performed, and this minimum is returned. Given two sets of points and, the RMSD is defined as follows: An RMSD value is expressed in length units. The most commonly used unit in structural biology is the Ångström which is equal to 10−10 m.
Uses
Typically RMSD is used as a quantitative measure of similarity between two or more protein structures. For example, the CASPprotein structure prediction competition uses RMSD as one of its assessments of how well a submitted structure matches the known, target structure. Thus the lower RMSD, the better the model is in comparison to the target structure. Also some scientists who study protein folding by computer simulations use RMSD as a reaction coordinate to quantify where the protein is between the folded state and the unfolded state. The study of RMSD for small organic molecules is common in the context of docking, as well as in other methods to study the configuration of ligands when bound to macromolecules. Note that, for the case of ligands, their structures are most commonly not superimposed prior to the calculation of the RMSD.