Moby Project

The Moby Project is a collection of public-domain lexical resources. It was created by Grady Ward. The resources were dedicated to the public domain, and are now mirrored at Project Gutenberg., it contains the largest free phonetic database, with 177,267 words and corresponding pronunciations.

Hyphenator

The Moby Hyphenator II contains hyphenations of 187,175 words and phrases. The character encoding appears to be MacRoman, and hyphenation is indicated by a bullet. Some entries, however, have a combination of actual hyphens and character 165, such as "".
There is little to no documentation of the hyphenation choices made; the following examples might give some flavour of the style of hyphenation used:.

Language

Moby Language II contains wordlists of five languages: French, German, Italian, Japanese, and Spanish:

Language	Words	Size
French	138,257	1,524,757
German	159,809	2,055,986
Italian	60,453	561,981
Japanese	115,523	934,783
Spanish	86,059	850,523
Total	560,101	5,928,030

However, some of the lists are contaminated, for example the Japanese list contains English words such as abnormal and non-words such as ' and '. There are also unusual peculiarities in the sorting of these lists, as the French list contains a straight alphabetical listing, while the German list contains the alphabetical listing of traditionally capitalized words and then the alphabetical listing of traditionally lower-cased words. The list of Italian words, however, contains no capitalized words whatsoever.
The foreign languages list does not use accented characters, so "" is how a user would look up the French word être''.

Part-of-Speech

Moby Part-of-Speech contains 233,356 words fully described by part of speech, listed in priority order. The format of the file is word\parts-of-speech, with the following parts of speech being identified:

Part-of-speech	Code
Noun	N
Plural	p
Noun phrase	h
Verb	V
Transitive verb	t
Intransitive verb	i
Adjective	A
Adverb	v
Conjunction	C
Preposition	P
Interjection	!
Pronoun	r
Definite article	D
Indefinite article	I
Nominative	o

Pronunciator

The Moby Pronunciator II contains 177,267 entries with corresponding pronunciations. Most of the entries describe a single word, but approximately 79,000 contain hyphenated or multiple word phrases, names, or lexemes. The Project Gutenberg distribution also contains a copy of the cmudict v0.3. The file contains lines of the format word pronunciation. Each line is ended with the ASCII Carriage Return character.
The word field can include apostrophes, hyphens, and multiple words separated by underscores. Non-English words are generally rendered, as stated in the documentation, without accents or other diacritical marks. However, in 36 entries, some non-ASCII accented characters remain, represented using Mac OS Roman encoding.
The part-of-speech field is used to disambiguate 770 of the words which have differing pronunciations depending on their part-of-speech. For example, for the words spelled close, the verb has the pronunciation, whereas the adjective is. The parts-of-speech have been assigned the following codes:

Part-of-speech	Code
Noun	n
Verb	v
Adjective	aj
Adverb	av
Interjection	interj

Following this is the pronunciation. Several special symbols are present:

Symbol	Meaning
_	Used to separate words
'	Primary stress on the following syllable
,	Secondary stress on the following syllable

The rest of the symbols are used to represent IPA characters. The pronunciations are generally consistent with a General American dialect of English, that exhibits father-bother merger, hurry-furry merger and lot-cloth split, but does not exhibit cot-caught merger or wine-whine merger. Each phoneme is represented by a sequence of one or more characters. Some of the sequences are delimited with a slash character "/", as shown in the following table, but note that the sequence for is delimited by two slash characters at either end:

Symbol
/&/	æ
/-/	ə
/@/	ʌ, ə
//r	ɜr, ər
/A/	ɑ, ɑː
/aI/	aɪ
/AU/	aʊ
b	b
d	d
/D/	ð
/dZ/	dʒ
/E/	ɛ
/eI/	eɪ
f	f
g	ɡ
h	h
hw	hw
/i/	iː
/I/	ɪ
/j/	j
/ju/	juː
k	k
l	l
m	m
n	n
/N/	ŋ
/O/	ɔ, ɔː
//Oi//	ɔɪ
/oU/	oʊ
p	p
r	r
s	s
/S/	ʃ
t	t
/T/	θ
/tS/	tʃ
/u/	uː
/U/	ʊ
v	v
w	w
z	z
/Z/	ʒ

To this collection are added a number of extra sequences representing phonemes found in several other languages. These are used to encode the non-English words, phrases and names that are included in the database. The following table contains these extra phonemes, but note that the extent to which some of these may exist due to encoding errors is not clear.

Symbol
A	a
e	e, ɛ
i	i, ɪ
N	Nasalisation of preceding vowel
o	o
O
R	ʁ
S	s
u	u
V	v, β, ʋ
W	w
/x/	x
/y/	ø
Y	y
/z/	ts
Z	z

Shakespeare

Moby Shakespeare contains the complete unabridged works of Shakespeare. This specific resource is not available from Project Gutenberg.

Thesaurus

The Moby Thesaurus II contains 30,260 root words, with 2,520,264 synonyms and related terms – an average of 83.3 per root word. Each line consists of a list of comma-separated values, with the first term being the root word, and all following words being related terms.
Grady Ward placed this thesaurus in the public domain in 1996. It is also available as a Debian package.

Words

Moby Words II is the largest wordlist in the world. The distribution consists of the following 16 files:

Filename	Words	Description
ACRONYMS.TXT	6,213	Common acronyms and abbreviations
COMMON.TXT	74,550	Common words present in two or more published dictionaries
COMPOUND.TXT	256,772	Phrases, proper nouns, and acronyms not included in the common words file
CROSSWD.TXT	113,809	Words included in the first edition of the Official Scrabble Players Dictionary
CRSWD-D.TXT	4,160	Additions to the Official Scrabble Players Dictionary in the second edition
FICTION.TXT	467	A list of the most commonly occurring substrings in the book The Joy Luck Club
FREQ.TXT	1,000	Most frequently occurring words in the English language, listed in descending order
FREQ-INT.TXT	1,000	Most frequently occurring words on Usenet in 1992, listed with corresponding percentage in decreasing order
KJVFREQ.TXT	1,185	Most frequently occurring substrings in the King James Version of the Bible, listed in descending order
NAMES.TXT	21,986	Most common names used in the United States and Great Britain
NAMES-F.TXT	4,946	Common English female names
NAMES-M.TXT	3,897	Common English male names
OFTENMIS.TXT	366	Most common misspelled English words
PLACES.TXT	10,196	Place names in the United States
SINGLE.TXT	354,984	Single words excluding proper nouns, acronyms, compound words and phrases, but including archaic words and significant variant spellings
USACONST.TXT	7,618	United States Constitution including all amendments current to 1993
Total	863,149	Not the total of unique words.
Total Uniq	639,995	Total of single, proper nouns, acronyms, and compound words and phrases.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...