Pathway analysis


In bioinformatics research, pathway analysis software is used to identify related proteins within a pathway or building pathway de novo from the proteins of interest. This is helpful when studying differential expression of a gene in a disease or analyzing any omics dataset with a large number of proteins. By examining the changes in gene expression in a pathway, its biological causes can be explored.
Pathway is the term from molecular biology which depicts an artificial simplified model of a process within a cell or tissue. A typical pathway model starts with an extracellular signaling molecule that activates a specific receptor, thus triggering a chain of protein-protein or protein-small molecule interactions.
Pathway analysis helps to understand or interpret omics data from the point of view of canonical prior knowledge structured in the form of pathways diagrams. It allows finding distinct cell processes, diseases or signaling pathways that are statistically associated with selection of differentially expressed genes between two samples. Often but erroneously pathway analysis is used as synonym for network analysis.

Uses

The data for pathway analysis come from high throughput biology. This includes high throughput sequencing data and microarray data. Before pathway analysis can be done, the omics data should be normalized, and genes should be ranked by differential expression usually with help of Student's t-test, ANOVA or other statistics. In general, any list of statistical ranked genes can be analyzed by pathway analysis. For example, often the functional activity of proteins can be inferred using network enrichment analysis of genes deferentially expressed in the experiment. Such functional activity scores can then be used for pathway analysis to find pathways responsible for observed differential expression. In case when ranking is not available, simply a list of all genes can be analyzed. Also it is possible to integrate multiple microarray data sets from different research groups by meta-analysis and cross-platform normalization. By using pathway analysis software, researchers can determine which gene groups such as pathways, cell processes or diseases are enriched with over and under expressed in experimental data genes. They can also infer associated upstream and downstream regulators, proteins, small molecules, drugs, etc. For example, pathway analysis of several independent microarray experiments helped to discover potential biomarkers in a single pathway important for fast-to-slow switch fiber type transition in Duchenne muscular dystrophy. In other study meta-analysis identified two biomarkers in blood of patients with Parkinson's disease, which can be useful for monitoring the disease.

Pathways Databases

Pathway analysis needs a knowledge base with pathway collection and interaction networks. Pathway collections content, structure and functionality usually vary in different sources. The examples of the pathway collections are KEGG, WikiPathways, and Reactome. Also there are commercial pathways collections such as Pathway Studio pathways and IPA pathways.

Methods and software

Pathway analysis software can be generally divided into web-based applications, desktop programs and programming packages. Programming packages are mostly coded in the R and Python languages, and are shared openly through the BioConductor and GitHub projects. Different methods of pathway analysis evolve fast, so classification of these methods is still discussable. There are 3 main groups of methods in pathway analysis that can be applied to any high-throughput data: ORA, FCS and PT.

Over-Representation Analysis or Enrichment Analysis (ORA)

This method measures the percentage of genes in a pathway or any gene group that have differential expression. The aim of ORA is to get a list of the most relevant pathways, ordered in accordance to a p-value. The basic hypothesis in ORA is that relevant pathways can be identified by the number of genes differently expressed in the experiment that pathways contain. The statistical significance of the overlap between genes from a pathway and the list of differently expressed genes is determined by such statistical tests as Fisher's exact test, hypergeometric distribution test or Jaccard index.

Functional Class Scoring (FCS)

This method analyzes the expression change of overall genes in the list of differently expressed in experiment genes. FCS discards the ORA cut-off threshold limitation. The aim of FCS is to evaluate differently expressed genes enrichment scores using pathways as gene sets to perform their computations. One of the first and most popular methods deploying the FCS approach is the Gene Set Enrichment Analysis.

Pathway Topology (PT)

Pathway topology is essentially the same as FCS, except PT uses gene-level statistics through different databases integration. However the critical difference is that by leveraging the information about role, position, and direction of interaction from the pathway database, PT is able to re-score the significance of a pathway as the linkages change, whereas FCS will always provide the same score. Examples for PT approaches include Signaling Pathway Impact Analysis, EnrichNet, GGEA, and TopoGSA.

Notable companies

Several companies have licensed software to perform a number of analytic methods on gene set. Most of free software solutions provide only links to online pathway collections; rather commercial ones have their own collections. The choice of best software depends on user skills, cost and time which one could spend on pathways analysis. Ingenuity, for example, charges a fee for use of their software. Some software, like STRING or Cytoscape are an open-source. However, Ingenuity maintains a knowledge base to compare gene expression data to. Pathways Studio is commercial software which allows to search biologically relevant facts, analyze experiments and create pathways. Pathways Studio Viewer is a free resource from that company for making acquaintance with Pathway Studio interactive pathway collection and database. Only two commercial applications are known to offer pathway topology based analyses, PathwayGuide from and MetaCore from Thomson Reuters. Advaita uses the peer reviewed Signaling Pathway Impact Analysis method while the MetaCore method is unpublished.

Limits

Missing annotations on cell types and conditions

Many current methods for pathway analysis depend on existing databases. The data used, however, is not always completely annotated. Many genes interactions in databases are relatively speculative as they are based on scientific facts, are pulled from a specific cell type or disease. Also most canonical pathways are built using the knowledge obtained from a limited number of experiments with narrow cell models. Therefore, interpretation of results of pathway analysis of omics data obtained from different tissues should be done with caution.