Snappy (compression)

Snappy is a fast data compression and decompression library written in C++ by Google based on ideas from LZ77 and open-sourced in 2011. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. Compression speed is 250 MB/s and decompression speed is 500 MB/s using a single core of a circa 2011 "Westmere" 2.26 GHz Core i7 processor running in 64-bit mode. The compression ratio is 20–100% lower than gzip.
Snappy is widely used in Google projects like Bigtable, MapReduce and in compressing data for Google's internal RPC systems. It can be used in open-source projects like MariaDB ColumnStore, Cassandra, Couchbase, Hadoop, LevelDB, MongoDB, RocksDB, Lucene, Spark, and InfluxDB. Decompression is tested to detect any errors in the compressed stream. Snappy does not use inline assembler and is portable.

Stream format

Snappy encoding is not bit-oriented, but byte-oriented. The format uses no entropy encoder, like Huffman tree or arithmetic encoder.
The first bytes of the stream are the length of uncompressed data, stored as a little-endian varint, which allows for variable-length encoding. The lower seven bits of each byte are used for data and the high bit is a flag to indicate the end of the length field.
The remaining bytes in the stream are encoded using one of four element types. The element type is encoded in the lower two bits of the first byte of the element:

00 – Literal – uncompressed data; upper 6 bits are used to store length of data. Lengths larger than 60 are stored in a 1-4 byte integer indicated by a 6 bit length of 60 to 63.
01 – Copy with length stored as 3 bits and offset stored as 11 bits; one byte after tag byte is used for part of offset;
10 – Copy with length stored as 6 bits of tag byte and offset stored as two-byte integer after the tag byte;
11 – Copy with length stored as 6 bits of tag byte and offset stored as four-byte little-endian integer after the tag byte;

The copy refers to the dictionary. The offset is the shift from the current position back to the already decompressed stream. The length is the number of bytes to copy from the dictionary. The size of the dictionary was limited by the 1.0 Snappy compressor to 32,768 bytes, and updated to 65,536 in version 1.1.

Example of a compressed stream

The text
may be compressed to this, shown as hex data with explanations:
0000000: ca02 f042 5769 6b69 7065 6469 6120 6973 ...BWikipedia is
The first 2 bytes, ca02 are the length, as a little-endian varint. Thus the most-significant byte is '02'. 0x02ca = 0x014a = 330 bytes.
The next two bytes, 0xf042, indicate that a literal of 66+1 bytes follows
0000010: 2061 2066 7265 652c 2077 6562 2d62 6173 a free, web-bas
0000020: 6564 2c20 636f 6c6c 6162 6f72 6174 6976 ed, collaborativ
0000030: 652c 206d 756c 7469 6c69 6e67 7561 6c20 e, multilingual
0000040: 656e 6379 636c 6f09 3ff0 8170 726f 6a65 encyclo.?..proje
0x09 is tag-byte of type 01 with length - 4 = 010₂ = 2₁₀ and offset = 0x03f = 63 or "pedia ";
0xf081 is a literal with length of 129+1 bytes
0000050: 6374 2e00 0000 0000 0000 0000 0000 0000 ct.
In this example, all common substrings with four or more characters were eliminated by the compression process. More common compressors can compress this better. Unlike compression methods such as gzip and bzip2, there is no entropy encoding used to pack alphabet into the bit stream.

Interfaces

Snappy distributions include C++ and C bindings. Third party-provided bindings and ports include C#, Common Lisp, Erlang, Go, Haskell, Lua, Java, Node.js, Perl, PHP, Python, R, Ruby, Rust, Smalltalk, and OpenCL.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...