The Moses machine translation decoder is an open source project that was created by and is maintained under the guidance of Philipp Koehn. The Moses decoder is a platform for developing Statistical machine translation systems given a parallel corpus for any language pair. The decoder was mainly developed by Hieu Hoang and Philipp Koehn at the University of Edinburgh and extended during a Johns Hopkins University Summer Workshop and further developed under Euromatrix and GALE project funding. The decoder is the de facto benchmark for research in the field. Although Koehn continues to play a major role in the development of Moses, the Moses decoder was supported by the European Framework 6 projects Euromatrix, TC-Star, the European Framework 7 projects EuroMatrixPlus, Let's MT, META-NET and MosesCore and the DARPA GALE project, as well as several universities such as the University of Edinburgh, the University of Maryland, ITC-irst, Massachusetts Institute of Technology, and others. Substantial additional contributors to the Moses decoder include Hieu Hoang, Chris Dyer, Josh Schroeder, Marcello Federico, Richard Zens, and Wade Shen.
Europarl Corpus
The Europarl corpus is a set of documents that consists of the proceedings of the European Parliament from 1996 to the present. The corpus has been compiled and expanded by a group of researchers led by Philipp Koehn at University of Edinburgh. The data that makes up the corpus was extracted from the website of the European Parliament and then prepared for linguistic research. The latest release comprised up to 60 million words per language, with 21 European languages represented: Romanic, Germanic, Slavic, Finno-Ugric, Baltic, and Greek.
Koehn is a professor at Johns Hopkins University where he continues his research into machine translation through his affiliation with the Center for Language and Speech Processing
Koehn is a professor and Chair of Machine Translation at the University of Edinburgh's School of Informatics and contributes to its Statistical Machine Translation Group which organises workshops, seminars and project related to the subject.
Koehn has consulted to SYSTRAN periodically between 2006 and 2011. SYSTRAN was acquired by CLSI, a Korean machine translation company in April 2014.
Koehn is also Chief Scientist for Omniscien Technologies and a shareholder in Omniscien Technologies since 2007. Omniscien Technologies is a private company developing and commercialising machine translation technologies.
Koehn authored a book titled Statistical Machine Translation in 2009.
Awards and recognition
2013: One of three finalists in the category of Research for the European Patent Office2013 European Inventor Award. Koehn was recognised for patent EP 1488338 B, Phrase-Based Joint Probability Model for Statistical Machine Translations, a translation model that uses mathematical probabilities to determine the most likely interpretation of chunks of text between foreign languages.
2015: Koehn received the Award of Honor of the International Association for Machine Translation