Phrap

Phrap is a widely used program for DNA sequence assembly. It is part of the Phred-Phrap-Consed package.

History

Phrap was originally developed by Prof. Phil Green for the assembly of cosmids in large-scale cosmid shotgun sequencing within the Human Genome Project. Phrap has been widely used for many different sequence assembly projects, including bacterial genome assemblies and EST assemblies.
Phrap was written as a command line program for easy integration into automated data workflows in genome sequencing centers. For users who want to use Phrap from a graphical interface, the commercial programs MacVector and CodonCode Aligner are available.

Methods

A detailed description of the Phrap algorithms can be found in the . A recurring thread within the Phrap algorithms is the use of Phred quality scores. Phrap used quality scores to mitigate a problem that other assembly programs had struggled with at the beginning of the Human Genome Project: correctly assembling frequent imperfect repeats, in particular Alu sequences. Phrap uses quality scores to tell if any observed differences in repeated regions are likely to be due to random ambiguities in the sequencing process, or more likely to be due to the sequences being from different copies of the Alu repeat. Typically, Phrap had no problems differentiating between the different Alu copies in a cosmid, and to correctly assemble the cosmids. The logic is simple: a base call with a high probability of being correct should never be aligned with another high quality but different base. However, Phrap does not rule out such alignments entirely, and the cross_match alignment gap and alignment penalties used while looking for local alignments are not always optimal for typical sequencing errors and a search for overlapping sequences.. Phrap attempts to classify chimeras, vector sequences and low quality end regions all in a single alignment and will sometimes make mistakes. Furthermore, Phrap has more than one round of assembly building internally and later rounds are less stringent - Greedy algorithm.
These design choices were helpful in the 1990s when the program was originally written but are less so now. Phrap appears error prone in comparison with newer assemblers like Euler and cannot use mate-pair information directly to guide assembly and assemble past perfect repeats. Phrap is not free software so it has not been extended and enhanced like less restricted open-source software Sequence assembly.

Quality based consensus sequences

Another use of Phred quality scores by Phrap that contributed to the program's success was the determination of consensus sequences using sequence qualities. In effect, Phrap automated a step that was a major bottleneck in the early phases of the Human Genome Project: to determine the correct consensus sequence at all positions where the assembled sequences had discrepant bases. This approach had been suggested by Bonfield and Staden in 1995, and was implemented and further optimized in Phrap. Basically, at any consensus position with discrepant bases, Phrap examines the quality scores of the aligned sequences to find the highest quality sequence. In the process, Phrap takes confirmation of local sequence by other reads into account, after considering direction and sequencing chemistry.
The mathematics of this approach were rather simple, since Phred quality scores are logarithmically linked to error probabilities. This means that the quality scores of confirming reads can simply be added, as long as the error distributions are sufficiently independent. To satisfy this independence criterion, reads must typically be in different direction, since peak patterns that cause base calling errors are often identical when a region is sequenced several times in the same direction.
If a consensus base is covered by both high-quality sequence and low-quality sequence, Phrap's selection of the higher quality sequence will in most cases be correct. Phrap then assigns the confirmed base quality to the consensus sequence base. This makes it easy to find consensus regions that are not covered by high quality sequence, and to quickly calculate a reasonably accurate estimate of the error rate of the consensus sequence. This information can then be used to direct finishing efforts, for example re-sequencing of problem regions.
The combination of accurate, base-specific quality scores and a quality-based consensus sequence was a critical element in the success of the Human Genome Project. Phred and Phrap, and similar programs who picked up on the ideas pioneered by these two programs, enabled the assembly of large parts of the human genome at an accuracy that was substantially higher than the typical accuracy of carefully hand-edited sequences that had been submitted to the GenBank database before.

Other Software

Phred
Consed
DNA Baser Command Line Tool

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...