Binary-to-text encoding

A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the channel does not allow binary data or is not 8-bit clean. PGP documentation uses the term "ASCII armor" for binary-to-text encoding when referring to Base64.

Description

The ASCII text-encoding standard uses 128 unique values to represent the alphabetic, numeric, and punctuation characters commonly used in English, plus a selection of control codes which do not represent printable characters. For example, the capital letter A is ASCII character 65, the numeral 2 is ASCII 50, the character ''

Encoding plain text

Binary-to-text encoding methods are also used as a mechanism for encoding plain text. For example:

Some systems have a more limited character set they can handle; not only are they not 8-bit clean, some cannot even handle every printable ASCII character.
Other systems have limits on the number of characters that may appear between line breaks, such as the "1000 characters per line" limit of some SMTP software, as allowed by.
Still others add headers or trailers to the text.
A few poorly-regarded but still-used protocols use in-band signaling, causing confusion if specific patterns appear in the message. The best-known is the string "From " at the beginning of a line used to separate mail messages in the mbox file format.

By using a binary-to-text encoding on messages that are already plain text, then decoding on the other end, one can make such systems appear to be completely transparent.
This is sometimes referred to as 'ASCII armoring'. For example, the ViewState component of ASP.NET uses base64 encoding to safely transmit text via HTTP POST,
in order to avoid delimiter collision.

Encoding standards

The table below compares the most used forms of binary-to-text encodings. The efficiency listed is the ratio between number of bits in the input and the number of bits in the encoded output.

Encoding	Data type	Efficiency	Programming language implementations	Comments
Ascii85	Arbitrary	80%	, , , , , , , ,	There exist several variants of this encoding, Base85, btoa, et cetera.
Base32	Arbitrary	62.5%	, ,
Base36	Arbitrary	~64%	bash, C, C++, C#, Java, Perl, PHP, Python, Visual Basic, Swift, many others	Uses the Arabic numerals 0–9 and the Latin letters A–Z. Commonly used by URL redirection systems like TinyURL or SnipURL/Snipr as compact alphanumeric identifiers.
Base58	Integer	~73%	,	Similar to Base64, but modified to avoid both non-alphanumeric characters and letters which might look ambiguous when printed.
Base64	Arbitrary	75%	, , , , many others
Base85	Arbitrary	80%	,	Revised version of Ascii85.
BinHex	Arbitrary	75%	, ,	MacOS Classic
Decimal	Integer	~42%	Most languages	Usually the default representation for input/output from/to humans.
Hexadecimal	Arbitrary	50%	Most languages	Exists in uppercase and lowercase variants
Intel HEX	Arbitrary	~<50%	,	Typically used to program EPROM, NOR-Flash memory chips
MIME	Arbitrary	See Quoted-printable and Base64	See Quoted-printable and Base64	Encoding container for e-mail-like formatting
MOS Technology file format	Arbitrary			Typically used to program EPROM, NOR-Flash memory chips.
Percent encoding	Text, Arbitrary	~40%	, , probably many others
Quoted-printable	Text	~33–100%	Probably many	Preserves line breaks; cuts lines at 76 characters
S-record	Arbitrary	49.6%	,	Typically used to program EPROM, NOR-Flash memory chips. 49.6% assumes 255 binary bytes per record.
Tektronix hex	Arbitrary			Typically used to program EPROM, NOR-Flash memory chips.
Uuencoding	Arbitrary	~60%	Perl, , , probably many others	Largely replaced by MIME and yEnc
Xxencoding	Arbitrary	~75%		Proposed as replacement for Uuencoding to avoid character set translation problems between ASCII and the EBCDIC systems that could corrupt Uuencoded data
yEnc	Arbitrary, mostly non-text	~98%		Includes a CRC checksum
	Arbitrary	33%	C, ,...	"A Convention for Human-readable 128-bit Keys". A series of small English words is easier for humans to read, remember, and type in than decimal or other binary-to-text encoding systems. Each 64-bit number is mapped to six short words, of one to four characters each, from a public 2048-word dictionary.

The 95 isprint codes 32 to 126 are known as the ASCII printable characters.
Some older and today uncommon formats include BOO, BTOA, and USR encoding.
Most of these encodings generate text containing only a subset of all ASCII printable characters: for example, the base64 encoding generates text that only contains upper case and lower case letters,, numerals, and the "+", "/", and "=" symbols.
Some of these encoding are based on a set of allowed characters and a single escape character. The allowed characters are left unchanged, while all other characters are converted into a string starting with the escape character. This kind of conversion allows the resulting text to be almost readable, in that letters and digits are part of the allowed characters, and are therefore left as they are in the encoded text.
These encodings produce the shortest plain ASCII output for input that is mostly printable ASCII.
Some other encodings are based on mapping all possible sequences of six bits into different printable characters. Since there are more than 2⁶ = 64 printable characters, this is possible. A given sequence of bytes is translated by viewing it as stream of bits, breaking this stream in chunks of six bits and generating the sequence of corresponding characters. The different encodings differ in the mapping between sequences of bits and characters and in how the resulting text is formatted.
Some encodings use four bits instead of six, mapping all possible sequences of 4 bits onto the 16 standard hexadecimal digits.
Using 4 bits per encoded character leads to a 50% longer output than base64, but simplifies encoding and decoding—expanding each byte in the source independently to two encoded bytes is simpler than base64's expanding 3 source bytes to 4 encoded bytes.
Out of PETSCII's first 192 codes, 164 have visible representations when quoted: 5, 17–20 and 28–31, 32–90, 91–127, 129, 133–140, 144–159, and 160–192. This theoretically permits encodings, such as base128, between PETSCII-speaking machines.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...