Ensembl Genomes

Ensembl Genomes is a scientific project to provide genome-scale data from non-vertebrate species.
The project is run by the European Bioinformatics Institute, and was launched in 2009 using the Ensembl technology. The main objective of the Ensembl Genomes database is to complement the main Ensembl
database by introducing five additional web pages to include genome data for bacteria, fungi, invertebrate metazoa, plants, and protists. For each of the domains, the Ensembl tools are available for manipulation, analysis and visualization of genome data. Most Ensembl Genomes data is stored in MySQL relational databases and can be accessed by the Ensembl REST interface, the Perl API, Biomart or online.
Ensembl Genomes is an open project, and most of the code, tools, and data are available to the public. Ensembl and Ensembl Genomes software uses an Apache 2.0 license license.

Displaying genomic data

The key feature of Ensembl Genomes is its graphical interface, which allows users to scroll through a genome and observe the relative location of features such as conceptual annotation, sequence patterns and experimental data. Graphical views are available for varying levels of resolution from an entire karyotype, down to the sequence of a single exon. Information for a genome is spread over four tabs, a species page, a ‘Location’ tab, a ‘Gene’ tab and a ‘Transcript’ tab, each providing information at a higher resolution.
Searching for a particular species using Ensembl Genomes redirects to the species page. Often, a brief description of the species is provided, as well as links to further information and statistics about the genome, the graphical interface and some of the tools available.
A karyotype is available for some species in Ensembl Genomes. If the karyotype is available there will be a link to it in the Gene Assembly section of the species page. Alternatively if users are in the ‘Location’ tab they can also view the karyotype by selecting ‘Whole genome’ in the left-hand menu. Users can click on a location within the karyotype to zoom in to one specific chromosome or a genomic region. This will open the ‘Location’ Tab.
In the 'Location' tab, users can browse genes, variations, sequence conservation, and other types of annotation along the genome. The 'Region in detail' is highly configurable and scalable, and users can choose what they want to see by clicking on the 'Configure this page' button at the bottom of the left-hand menu. By adding and removing tracks users will be able to select the type of data they want to have included in the displays. Data from the following categories can be easily added or removed from this 'Location' tab view: 'Sequence and assembly', 'Genes and transcripts', 'mRNA and protein alignments', 'Other DNA alignments', 'Germline variation', 'Comparative genomics', among others. Users can also change the display options such as the width. A further option allows users to reset the configuration back to the default settings.
More specific information about a select gene can be found in the ‘Gene’ tab. Users can get to this page by searching for desired gene in the search bar and clicking on the gene ID or by clicking on one of the genes shown in the ‘Location’ tab view. The ‘Gene’ tab contains gene-specific information such as gene structure, number of transcripts, position on the chromosome and homology information in the form of gene trees. This information can be accessed via the menu on the left-hand side.
A 'Transcript' tab will also appear when a user chooses to view a gene. The 'Transcript' tab contains much of the same information as the 'Gene' tab, however it is focused on only one transcript.

Tools

Adding Custom tracks to Ensembl Genomes

Ensembl Genomes allows comparing and visualising user data while browsing karyotypes and genes. Most Ensembl Genomes views include an ‘Add your data’ or ‘Manage your data’ button that will allow the user to upload new tracks containing reads or sequences to Ensembl Genomes or to modify data that has been previously uploaded. The uploaded data can be visualised in region views or over the whole karyotype. The uploaded data can be localised using Chromosome Coordinates or BAC Clone Coordinates.
The following methods can be used to upload a data file to any Ensembl Genomes page:

Files smaller than 5 MB can be either uploaded directly from any computer or from a web location to the Ensembl servers.
Lager files can only be uploaded from web locations.
BAM files can only be uploaded using the URL-based approach. The index file should be located in the same webserver.
A Distributed Annotation System source can be attached from web locations.

The following file types are supported by Ensembl Genomes:

BED
BedGraph
Generic
GFF/GTF
PSL
WIG
BAM
BigBed
BigWig
VCF

The data is uploaded temporarily into the servers. Registered users can log in and save their data for future reference. It is possible to share and access the uploaded data using and an assigned URL. Users are also allowed to delete their custom tracks from Ensembl Genomes.

BioMart

is a programming free search engine incorporated in Ensembl and Ensembl Genomes for the purpose of mining and extracting genomic data from the Ensembl databases in table formats like HTML, TSV, CSV or XLS. Release 45 of Ensembl Genomes has the following data available at the BioMarts:

Ensembl Protists BioMart: includes 33 species and variations for Phytophthora infestans and Phaeodactylum tricornutum
Ensembl Fungi BioMart: includes 56 species and variations for Fusarium graminearum, Fusarium oxysporum, Schizosaccharomyces pombe, Puccinia graminis, Verticillium dahliae, Zymoseptoria tritici, and Saccharomyces cerevisiae
Ensembl Metazoa BioMart: includes 78 species and variations for Aedes aegypti, Anopheles gambiae and Ixodes scapularis
Ensembl Plants: includes 67 species and variations for Arabidopsis thaliana, Brachypodium distachyon, Hordeum vulgare, Oryza glaberrima, Oryza glumipatula, Oryoza sativa indica, Oryza sativa japonica, Solanum lycopersicum, Sorghum bicolor, Triticum aestivum, Vitis vinifera, and Zea mays

The purpose of the BioMarts in Ensembl Genomes is to allow the user to mine and download tables containing all the genes for a single species, genes in a specific region of a chromosome or genes on one region of a chromosome associated with an InterPro domain. The BioMarts also include filters to refine the data to be extracted and the attributes that will appear in the final table file can be selected by the user.
The BioMarts can be accessed online in each corresponding domain of Ensembl Genomes or the source code can be installed in UNIX environment from the BioMart git repository

BLAST

A BLAST interface is provided to allow users to search for DNA or protein sequences against the Ensembl Genomes. It can be accessed by the header, located on top of all Ensembl Genome pages, titled BLAST. The BLAST search can be configured to search against individual species or collections of species. There is a taxonomic browser to allow the selection of taxonomically related species.

Sequence Search

Ensembl Genomes provides a second sequence search tool, that uses an algorithm based on Exonerate, that is provided by European Nucleotide Archive. This tool can be accessed by the header, located on top of all Ensembl Genome pages, titled Sequence Search. Users can then choose whether they would like Exonerate to search against all species in the Ensembl Genomes division or against all species in Ensembl Genomes. They can also choose the 'Maximum E-value', which will limit the results that appear to those with E-values below the maximum. Finally users can choose to use an alternative search mode by selecting 'Use spliced query'.

Variant Effect Predictor

The Variant Effect Predictor is one of the most used tools in Ensembl and Ensembl Genomes. It allows to explore and analyse what is the effect that the variants have on a particular gene, sequence, protein, transcript or transcription factor. To use VEP, the users must input the location of their variants and the nucleotide variations to generate the following results:

Genes and transcripts affected by the variant
Location of the variants
How the variant affects the protein synthesis
Comparison with other databases to find equal known variants

There are two ways in which the users can access the VEP. The first form is online-based. In this page, the user generates an input by selection the following parameters:

Species to be compared. The default database for comparison is Ensembl Transcripts, but for some species, other sources can be selected.
Name for the uploaded data
Selection of the input format for the data. If an incorrect file format is selected, VEP will throw an error when running.
Fields for data upload. Users can upload data from their computers, from an URL-based location or by copying directly their contents into a text box.

Data upload to VEP supports VCF, pileup, HGVS notations and a default format. The default format is a whitespace-separated file that contains the data in columns. The first five columns indicate the chromosome, start location, end location, allele and the strand. The sixth column is a variation identifier and it is optional. If it is left in blank, VEP will assign an identifier to in output file.
VEP also provides additional identifier options to the users, extra options to complement the output and filtering. The filtering options allow features like removal of known variants from results, returning variants in exons only, and restriction of results to specific consequences of the variants.
VEP users also have the possibility of viewing and manipulating all the jobs associated with their session by browsing the "Recent Tickets" tab. I this tab the users can view the status of their search and save, delete or resubmit jobs.
The second option to use VEP is by downloading the source code for its use in UNIX environments. All the features are equal between the online and script versions. VEP can also be used with online instances like Galaxy.
When a VEP job is completed the output is a tabular file that contains the following columns:

Uploaded variation - as chromosome_start_alleles
Location - in standard coordinate format
Allele - the variant allele used to calculate the consequence
Gene - Ensembl stable ID of affected gene
Feature - Ensembl stable ID of feature
Feature type - type of feature. Currently one of Transcript, RegulatoryFeature, MotifFeature.
Consequence - consequence type of this variation
Position in cDNA - relative position of base pair in cDNA sequence
Position in CDS - relative position of base pair in coding sequence
Position in protein - relative position of amino acid in protein
Amino acid change - only given if the variation affects the protein-coding sequence
Codon change - the alternative codons with the variant base in upper case
Co-located variation - known identifier of existing variation
Extra - this column contains extra information as key=value pairs separated by ";". Displays extra identifiers.

Other common output formats for VEP include JSON and VDF formats.

Programmatic data access

The Ensembl Genomes interface allows access to the data using your favourite programming language.
You can also access data using the Perl API and Biomart.

Current species

Ensembl Genomes makes no attempt to include all possible genomes, rather the genomes that are included on the site are those that are deemed to be scientifically important. Each site contains the following number of species:

The bacterial division of Ensembl now contains all bacterial genomes that have been completely sequenced, annotated and submitted to the International Nucleotide Sequence Database Collaboration. The current dataset contains 44,048 genomes.
Ensembl Fungi contains 1014 genomes
Ensembl Metazoa contains 78 genomes for invertebrate species. The main Ensembl site contains 236 genomes for vertebrate species.
Ensembl Plants contains 67 genomes
Ensembl Protists contains 237 genomes
Collaborations

Ensembl Genomes continuously expands the annotation data through collaboration with other organisations involved in genome annotation projects and research. The following organisations are collaborators of Ensembl Genomes:

AllBio
Barley
Culicoides sonorensis
Gramene
INFRAVEC
Microme
PomBase
PhytoPath
transPLANT
Triticeae Genomics for Sustainable Agriculture
VectorBase
Wheat Rust Genomic Improvement
WormBase

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...