Proteinogenic amino acid
Proteinogenic amino acids are amino acids that are incorporated biosynthetically into proteins during translation. The word "proteinogenic" means "protein creating". Throughout known life, there are 22 genetically encoded amino acids, 20 in the standard genetic code and an additional 2 that can be incorporated by special translation mechanisms.
In contrast, non-proteinogenic amino acids are amino acids that are either not incorporated into proteins, misincorporated in place of a genetically encoded amino acid, or not produced directly and in isolation by standard cellular machinery. The latter often results from post-translational modification of proteins. Some non-proteinogenic amino acids are incorporated into nonribosomal peptides which are synthesized by non-ribosomal peptide synthetases.
Both eukaryotes and prokaryotes can incorporate selenocysteine into their proteins via a nucleotide sequence known as a SECIS element, which directs the cell to translate a nearby UGA codon as selenocysteine. In some methanogenic prokaryotes, the UAG codon can also be translated to pyrrolysine.
In eukaryotes, there are only 21 proteinogenic amino acids, the 20 of the standard genetic code, plus selenocysteine. Humans can synthesize 12 of these from each other or from other molecules of intermediary metabolism. The other nine must be consumed, and so they are called essential amino acids. The essential amino acids are histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, and valine.
The proteinogenic amino acids have been found to be related to the set of amino acids that can be recognized by ribozyme autoaminoacylation systems. Thus, non-proteinogenic amino acids would have been excluded by the contingent evolutionary success of nucleotide-based life forms. Other reasons have been offered to explain why certain specific non-proteinogenic amino acids are not generally incorporated into proteins; for example, ornithine and homoserine cyclize against the peptide backbone and fragment the protein with relatively short half-lives, while others are toxic because they can be mistakenly incorporated into proteins, such as the arginine analog canavanine.
Structures
The following illustrates the structures and abbreviations of the 21 amino acids that are directly encoded for protein synthesis by the genetic code of eukaryotes. The structures given below are standard chemical structures, not the typical zwitterion forms that exist in aqueous solutions.values
IUPAC/IUBMB now also recommends standard abbreviations for the following two amino acids:
Chemical properties
Following is a table listing the one-letter symbols, the three-letter symbols, and the chemical properties of the side chains of the standard amino acids. The masses listed are based on weighted averages of the elemental isotopes at their natural abundances. Forming a peptide bond results in elimination of a molecule of water. Therefore, the protein's mass is equal to the mass of amino acids the protein is composed of minus 18.01524 Da per peptide bond.General chemical properties
Side-chain properties
Amino acid | Short | Side chain | Hydro- phobic | Polar | pH | Small | Tiny | Aromatic or Aliphatic | van der Waals volume | ||
Alanine | A | Ala | -CH3 | - | - | Aliphatic | 67 | ||||
Cysteine | C | Cys | -CH2SH | 8.55 | acidic | - | 86 | ||||
Aspartic acid | D | Asp | -CH2COOH | 3.67 | acidic | - | 91 | ||||
Glutamic acid | E | Glu | -CH2CH2COOH | 4.25 | acidic | - | 109 | ||||
Phenylalanine | F | Phe | -CH2C6H5 | - | - | Aromatic | 135 | ||||
Glycine | G | Gly | -H | - | - | - | 48 | ||||
Histidine | H | His | -CH2-C3H3N2 | 6.54 | weak basic | Aromatic | 118 | ||||
Isoleucine | I | Ile | -CHCH2CH3 | - | - | Aliphatic | 124 | ||||
Lysine | K | Lys | -4NH2 | 10.40 | basic | - | 135 | ||||
Leucine | L | Leu | -CH2CH2 | - | - | Aliphatic | 124 | ||||
Methionine | M | Met | -CH2CH2SCH3 | - | - | Aliphatic | 124 | ||||
Asparagine | N | Asn | -CH2CONH2 | - | - | - | 96 | ||||
Pyrrolysine | O | Pyl | -4NHCOC4H5NCH3 | N.D. | weak basic | - | |||||
Proline | P | Pro | -CH2CH2CH2- | - | - | - | 90 | ||||
Glutamine | Q | Gln | -CH2CH2CONH2 | - | - | - | 114 | ||||
Arginine | R | Arg | -3NH-CNH2 | 12.3 | strongly basic | - | 148 | ||||
Serine | S | Ser | -CH2OH | - | - | - | 73 | ||||
Threonine | T | Thr | -CHCH3 | - | - | - | 93 | ||||
Selenocysteine | U | Sec | -CH2SeH | 5.43 | acidic | - | |||||
Valine | V | Val | -CH2 | - | - | Aliphatic | 105 | ||||
Tryptophan | W | Trp | -CH2C8H6N | - | - | Aromatic | 163 | ||||
Tyrosine | Y | Tyr | -CH2-C6H4OH | 9.84 | weak acidic | Aromatic | 141 |
§: Values for Asp, Cys, Glu, His, Lys & Tyr were determined using the amino acid residue placed centrally in an alanine pentapeptide. The value for Arg is from Pace et al.. The value for Sec is from Byun & Kang.
N.D.: The pKa value of Pyrrolysine has not been reported.
Note: The pKa value of an amino-acid residue in a small peptide is typically slightly different when it is inside a protein. Protein pKa calculations are sometimes used to calculate the change in the pKa value of an amino-acid residue in this situation.
Gene expression and biochemistry
* UAG is normally the amber stop codon, but in organisms containing the biological machinery encoded by the pylTSBCD cluster of genes the amino acid pyrrolysine will be incorporated.** UGA is normally the opal stop codon, but encodes selenocysteine if a SECIS element is present.
† The stop codon is not an amino acid, but is included for completeness.
†† UAG and UGA do not always act as stop codons.
‡ An essential amino acid cannot be synthesized in humans and must, therefore, be supplied in the diet. Conditionally essential amino acids are not normally required in the diet, but must be supplied exogenously to specific populations that do not synthesize it in adequate amounts.
& Occurrence of amino acids is based on 135 Archaea, 3775 Bacteria, 614 Eukaryota proteomes and human proteome respectively.
Mass spectrometry
In mass spectrometry of peptides and proteins, knowledge of the masses of the residues is useful. The mass of the peptide or protein is the sum of the residue masses plus the mass of water. The residue masses are calculated from the tabulated chemical formulas and atomic weights. In mass spectrometry, ions may also include one or more protons.Amino acid | Short | Formula | Mon. mass§ | ||
Alanine | A | Ala | C3H5NO | 71.03711 | 71.0779 |
Cysteine | C | Cys | C3H5NOS | 103.00919 | 103.1429 |
Aspartic acid | D | Asp | C4H5NO3 | 115.02694 | 115.0874 |
Glutamic acid | E | Glu | C5H7NO3 | 129.04259 | 129.1140 |
Phenylalanine | F | Phe | C9H9NO | 147.06841 | 147.1739 |
Glycine | G | Gly | C2H3NO | 57.02146 | 57.0513 |
Histidine | H | His | C6H7N3O | 137.05891 | 137.1393 |
Isoleucine | I | Ile | C6H11NO | 113.08406 | 113.1576 |
Lysine | K | Lys | C6H12N2O | 128.09496 | 128.1723 |
Leucine | L | Leu | C6H11NO | 113.08406 | 113.1576 |
Methionine | M | Met | C5H9NOS | 131.04049 | 131.1961 |
Asparagine | N | Asn | C4H6N2O2 | 114.04293 | 114.1026 |
Pyrrolysine | O | Pyl | C12H19N3O2 | 237.14773 | 237.2982 |
Proline | P | Pro | C5H7NO | 97.05276 | 97.1152 |
Glutamine | Q | Gln | C5H8N2O2 | 128.05858 | 128.1292 |
Arginine | R | Arg | C6H12N4O | 156.10111 | 156.1857 |
Serine | S | Ser | C3H5NO2 | 87.03203 | 87.0773 |
Threonine | T | Thr | C4H7NO2 | 101.04768 | 101.1039 |
Selenocysteine | U | Sec | C3H5NOSe | 150.95364 | 150.0489 |
Valine | V | Val | C5H9NO | 99.06841 | 99.1311 |
Tryptophan | W | Trp | C11H10N2O | 186.07931 | 186.2099 |
Tyrosine | Y | Tyr | C9H9NO2 | 163.06333 | 163.1733 |
§ Monoisotopic mass
Stoichiometry and metabolic cost in cell
The table below lists the abundance of amino acids in E.coli cells and the metabolic cost for synthesis of the amino acids. Negative numbers indicate the metabolic processes are energy favorable and do not cost net ATP of the cell. The abundance of amino acids includes amino acids in free form and in polymerization form.Remarks
Catabolism
Amino acids can be classified according to the properties of their main products:- Glucogenic, with the products having the ability to form glucose by gluconeogenesis
- Ketogenic, with the products not having the ability to form glucose: These products may still be used for ketogenesis or lipid synthesis.
- Amino acids catabolized into both glucogenic and ketogenic products
General references