MG-RAST

MG-RAST is an open-source web application server that suggests automatic phylogenetic and functional analysis of metagenomes. It is also one of the biggest repositories for metagenomic data. The name is an abbreviation of Metagenomic Rapid Annotations using Subsystems Technology.
The pipeline automatically produces functional assignments to the sequences that belong to the metagenome by performing sequence comparisons to databases in both
nucleotide and amino-acid levels. The applications supplies phylogenetic and functional assignments of the metagenome being analysed, as well as tools for comparing different metagenomes. It also provides a for programmatic access.
The server was created and maintained by Argonne National Laboratory from the University of Chicago. In December 29 of 2016, the system had analyzed 60 terabase-pairs of data from more than 150,000 data sets. Among the analyzed data sets, more than 23,000 are available to the public.
Currently, the computational resources are provided by the DOE Magellan cloud at Argonne National Laboratory, Amazon EC2 Web services, and a number of traditional clusters.

Background

MG-RAST has been developed as an effort to have a free, public resource for the analysis and the storage of metagenome sequence data. The service removes one of the primary bottlenecks in metagenome analysis: the availability of high-performance computing for annotating data.
Metagenomic and metatranscriptomic studies involve the processing of large datasets and therefore they can require computationally expensive analysis. Nowadays, scientists are able to generate such volumes of data because, in the recent years, the sequencing costs have reduced dramatically. This fact has shifted the limiting factor to the computing costs:for instance, a recent study of the University of Maryland, estimated a cost of more than $5 million per terabase using their metagenome analysis pipeline. As the size and number of sequence datasets continue to increase, costs related to their analysis will continue to rise.
Additionally, MG-RAST also works as a repository tool for metagenomic data. Metadata collection and interpretation is vital for genomic and metagenomic studies, and challenges in this regard include the exchange, curation, and distribution of this information. The MG-RAST system has been an early adopter of the minimal checklist standards and the expanded biome-specific environmental packages devised by the Genomics Standards Consortium, and provides an easy-to-use uploader for metadata capture at the time of data submission.

Pipeline for metagenomic data analysis

The MG-RAST application offers automated quality control, annotation, comparative analysis and archiving service of metagenomic and amplicon sequences using a combination of several bioinformatics tools. The application was built to analyze metagenomic data, but it also supports amplicon sequences and metatranscriptome sequences processing. Presently, MG-RAST is not capable of predicting coding regions from eukaryotes and therefore it is of limited use for eukaryotic metagenomes analysis.
The pipeline of MG-RAST can be divided into five stages:

Data hygiene

Includes steps for quality control and artifacts removal. Firstly, low-quality regions are trimmed using SolexaQA and reads showing inappropriate lengths are removed. A dereplication step is included in the case of metagenome and metatranscriptome datasets processing. Subsequently, DRISEE is used to assess the sample sequencing error based on Artificial Duplicate Reads measuring. And finally, the pipeline offers the possibility of screening the reads using Bowtie aligner and removing the reads showing matches close to model organisms genomes.

Feature extraction

MG-RAST identifies gene sequences by using a machine learning approach: FragGeneScan. Ribosomal RNA sequences are identified through an initial BLAT search against a reduced version of SILVA database.

Feature annotation

In order to identify the putative functions and annotation of the genes, MG-RAST builds clusters of proteins at 90% identity level using the UCLUST implementation in QIIME. The longest sequence of each cluster will be selected for a similarity analysis. The similarity analysis is computed through sBLAT. The search is computed against a protein database derived from the M5nr, which provides nonredundant integration of sequences from GenBank, SEED, IMG, UniProt, KEGG and eggNOGs databases.
The reads associated to rRNA sequences are clustered at 97% identity. The longest sequence of each cluster is picked as representative and will be used for a BLAT search against the M5rna database, which integrates SILVA, Greengenes and RDP.

Profile generation

The data is integrated into a number of data products. The most important ones are the abundance profiles, which represent a pivoted and aggregated version of the similarity files.

Data loading

Finally, the obtained abundance profiles are loaded into the respective databases.

Detailed steps of the MR-RAST pipeline

MR-RAST Pipeline	Description
qc_stats	Generate quality control statistics
preprocess	Preprocessing, to trim low-quality regions from FASTQ data
dereplication	Dereplication for shotgun metagenome data by using k-mer approach
screen	Removing reads that are near-exact matches to the genomes of model organisms
rna detection	BLAT search against a reduced RNA database, to identifies ribosomal RNA
rna clustering	rRNA-similar reads are then clustered at 97% identity
rna sims blat	BLAT similarity search for the longest cluster representative against the M5rna database
genecalling	A machine learning approach, FragGeneScan, to predict coding regions in DNA sequences
aa filtering	Filter proteins
aa clustering	Cluster proteins at 90% identity level using uclust
aa sims blat	BLAT similarity analysis to identify protein
aa sims annotation	Sequence similarity against protein database from the M5nr
rna sims annotation	Sequence similarity against RNA database from the M5rna
index sim seq	Index sequence similarity to data sources
md5 annotation summary	Generate summary report md5 annotation, function annotation, organism annotation, LCAa annotation, ontology annotation and source annotation
function annotation summary	Generate summary report md5 annotation, function annotation, organism annotation, LCAa annotation, ontology annotation and source annotation
organism annotation summary	Generate summary report md5 annotation, function annotation, organism annotation, LCAa annotation, ontology annotation and source annotation
lca annotation summary	Generate summary report md5 annotation, function annotation, organism annotation, LCAa annotation, ontology annotation and source annotation
ontology annotation summary	Generate summary report md5 annotation, function annotation, organism annotation, LCAa annotation, ontology annotation and source annotation
source annotation summary	Generate summary report md5 annotation, function annotation, organism annotation, LCAa annotation, ontology annotation and source annotation
md5 summary load	Load summary report to the project
function summary load	Load summary report to the project
organism summary load	Load summary report to the project
lca summary load	Load summary report to the project
ontology summary load	Load summary report to the project
done stage
notify job completion	Send notification to user via email

MG-RAST utilities

Besides metagenome analysis, MG-RAST can also be used for data discovery. The visualization or comparison of metagenomes profiles and data sets can be implemented in a wide variety of modes; the web interface allows to select data based on criteria like composition, sequences quality, functionality or sample type and offers several ways to compute statistical inferences and ecological analyses. The profiles for the metagenomes can be visualized and compared by using barcharts, trees, spreadsheet-like tables, heatmaps, PCoA, rarefaction plots, circular recruitment plot, and KEGG maps.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...