Mutational signatures are characteristic combinations of mutation types arising from specific mutagenesis processes such as DNA replication infidelity, exogenous and endogenous genotoxins exposures, defective DNA repair pathways and DNA enzymatic editing. Deciphering mutational signatures in cancer provides insight into the biological mechanisms involved in carcinogenesis and normal somatic mutagenesis. Mutational signatures have shown their applicability in cancer treatment and cancer prevention. Advances in the fields of oncogenomics have enabled the development and use of molecularly targeted therapy, but such therapies historically focused on inhibition of oncogenic drivers. More recently, mutational signatures profiling has proven successful in guiding oncological management and use of targeted therapies.
General concepts
Mechanisms – overview
Genomic data
Cancer mutational signatures analyses require genomic data from cancer genome sequencing with paired-normal DNA sequencing in order to create the tumor mutation catalog of a specific tumor. Different types of mutations can be used individually or in combination to model mutational signatures in cancer.
Types of mutations: base substitutions
There are six classes of base substitution: C>A, C>G, C>T, T>A, T>C, T>G. The G>T substitution is considered equivalent to the C>A substitution because it is not possible to differentiate on which DNA strand the substitution initially occurred. Both the C>A and G>T substitutions are therefore counted as part of the "C>A" class. For the same reason the G>C, G>A, A>T, A>G and A>C mutations are counted as part of the "C>G", "C>T", "T>A", "T>C" and "T>G" classes respectively. Taking the information from the 5' and 3' adjacent bases lead to 96 possible mutation types. The mutation catalog of a tumor is created by categorizing each single nucleotide variant in one of the 96 mutation types and counting the total number of substitutions for each of these 96 mutation types.
Tumor mutation catalog
Once the mutation catalog of a tumor is obtained, there are two approaches to decipher the contributions of different mutational signatures to tumor genomic landscape:
* The mutation catalog of the tumor is compared to a reference mutation catalogue, or mutational signatures reference dataset, such as the 30 Signatures of Mutational Processes in Human Cancer from the Catalogue of Somatic Mutation In Cancer database.
* De novo mutational signatures modelling can be accomplished using statistical methods such as non-negative matrix factorization to identify potential novel mutational processes.
Identifying the contributions of diverse mutational signatures to carcinogenesis provides insight into tumor biology and can offer opportunities for targeted therapy.
deficiency leads to Signature 3 substitution pattern, but also to increase burden of structural variants. In the absence of homologous recombination, non-homologous end joining leads to large structural variants such as chromosomal translocations, chromosomal inversions and copy number variants.
Mutational signatures
Age-related mutagenesis
Signature 1 features a predominance of C>T transition in the NpG trinucleotide contexts and correlates with the age of patient at time of cancer diagnosis. The underlying proposed biological mechanism is the spontaneous deamination of 5-methylcytosine. Signature 5 has a predominance of T>C substitutions in the ApTpN trinucleotide context with transcriptional strand bias.
Signature 3 displays high mutation counts of multiple mutation classes and is associated with germline and somatic BRCA1 and BRCA2 mutations in several cancer types. This signature results from DNA double-strand break repair deficiency. Signature 3 is associated with high burden of indels with microhomology at the breakpoints.
family of cytidine deaminase enzymes respond to viral infections by editing viral genome, but the enzymatic activity of APOBEC3A and APOBEC3B has also been found to cause unwanted host genome editing and may even participate to oncogenesis in human papillomavirus-related cancers. Signature 2 and Signature 13 are enriched for C>T and C>G substitutions and are thought to arise from cytidine deaminase activity of the AID/APOBEC enzymes family. A germline deletion polymorphism involving APOBEC3A and APOBEC3B is associated with high burden of Signature 2 and Signature 13 mutations. This polymorphism is considered to be of moderate penetrance for breast cancer risk. The exact roles and mechanisms underlying APOBEC-mediated genome editing are not yet fully delineated, but activation-induced cytidine deaminase/APOBEC complex is thought to be involved in host immune response to viral infections and lipid metabolism. Both Signature 2 and Signature 13 are feature cytosine to uracil substitutions due to cytidine deaminases. Signature 2 has a higher proportion of CN substitutions and Signature 13 a higher proportion of TN substitutions. APOBEC3A and APOBEC3B-mediated mutagenesis preferentially involve the lagging DNA strand during replication.
Four COSMIC mutational signatures have been associated with DNA mismatch repair deficiency and found in tumors with microsatellite instability: Signature 6, 15, 20 and 26. Loss of function MLH1, MSH2, MSH6 or PMS2 genes cause defective DNA mismatch repair.
Signature 10 has a transcriptional bias and is enriched for C>A substitutions in the TpCpT context as well as T>G substitutions in the TpTpTp context. Signature 10 is associated with altered function of DNA polymerase epsilon, which result in deficient DNA proofreading activity. Both germline and somatic POLE exonuclease domain mutations are associated with Signature 10.
Base excision repair
Somatic enrichment for transversion mutations has been associated with base excision repair deficiency and linked to defective MUTYH, a DNA glycosylase, in colorectal cancer. Direct DNA oxidation damage leads to the creation of 8-Oxoguanine, which if remains un-repaired, will lead to incorporation of adenine instead of cytosine during DNA replication. MUTYH encodes the adenine glycosylase enzyme which excise the mismatched adenine from 8-Oxoguanine:adenine base pairing, therefore enabling DNA repair mechanisms involving OGG1 and NUDT1 to remove the damaged 8-Oxoguanine.
Exposures to exogenous genotoxins
Selected exogenous genotoxins/carcinogens and their mutagen-induced DNA damage and repair mechanisms have been linked to specific molecular signatures.
Signature 9 has been identified in chronic lymphocytic leukemia and malignant B-cell lymphoma and feature enrichment for T>G transversion events. It is thought to result from error-prone polymerase ' -associated mutagenesis. Recently, polymerase ' error-prone synthesis signature has been linked to non-hematological cancers and was hypothesized to contribute to YCG motif mutagenesis and could partly explain the increase TC dinucleotides substitutions.
History
During the 1980s, Curtis Harris at the US National Cancer Institute and Bert Vogelstein at the Johns Hopkins Oncology Center in Baltimore had managed to show that different types of cancer had their own unique suite of mutations in p53, which were likely to have been caused by different agents, such as the chemicals in tobacco smoke or ultraviolet light from the sun. With the advent of next-generation sequencing, Michael Stratton saw the potential for the technology to revolutionize our understanding of the genetic changes inside individual tumors, setting the Wellcome Sanger Institute's huge banks of DNA-sequencing machines in motion to read every single letter of DNA in a tumor. By 2009, Stratton and his team had produced the first whole cancer genome sequences. These were detailed maps showing all the genetic changes and mutations that had occurred within two individual cancers—a melanoma from the skin and a lung tumor. The melanoma and lung cancer genomes were powerful proof that the fingerprints of specific culprits could be seen in cancers with one major cause. These tumors still contained many mutations that could not be explained by ultraviolet light or tobacco smoking. The detective work became a lot more complicated for cancers with complex, multiple or even completely unknown origins. By way of analogy, imagine a forensic scientist dusting for fingerprints at a murder scene. The forensic scientist might strike it lucky and find a set of perfect prints on a windowpane or door handle that match a known killer. However, they are much more likely to uncover a mish-mash of fingerprints belonging to a whole range of folk—from the victim and potential suspects to innocent parties and police investigators—all laid on top of each other on all sorts of surfaces. This is very similar to cancer genomes where multiple mutational patterns are commonly overlaid one over another making the data incomprehensible. Fortunately, a PhD student of Stratton's, Ludmil Alexandrov came up with a way of mathematically solving the problem. Alexandrov demonstrated that mutational patterns from individual mutagens found in a tumor can be distinguished from one another using a mathematical approach called blind source separation. The newly disentangled patterns of mutations were termed mutational signatures. In 2013, Alexandrov and Stratton published the first computational framework for deciphering mutational signatures from cancer genomics data. Subsequently, they applied this framework to more than seven thousand cancer genomes creating the first comprehensive map of mutational signatures in human cancer. Currently, more than one hundred mutational signatures have been identified across the repertoire of human cancer.