One-way compression function
In cryptography, a one-way compression function is a function that transforms two fixed-length inputs into a fixed-length output. The transformation is "one-way", meaning that it is difficult given a particular output to compute inputs which compress to that output. One-way compression functions are not related to conventional data compression algorithms, which instead can be inverted exactly or approximately to the original data.
One-way compression functions are for instance used in the Merkle–Damgård construction inside cryptographic hash functions.
One-way compression functions are often built from block ciphers.
Some methods to turn any normal block cipher into a one-way compression function are Davies–Meyer, Matyas–Meyer–Oseas, Miyaguchi–Preneel and MDC-2/Meyer–Schilling, MDC-4, Hirose. These methods are described in detail further down.
Compression
A compression function mixes two fixed length inputs and produces a single fixed length output of the same size as one of the inputs. This can also be seen as that the compression function transforms one large fixed-length input into a shorter, fixed-length output.For instance, input A might be 128 bits, input B 128 bits and they are compressed together to a single output of 128 bits. This is equivalent to having a single 256-bit input compressed to a single output of 128 bits.
Some compression functions do not compress by half, but instead by some other factor. For example, input A might be 256 bits, and input B 128 bits, which are compressed to a single output of 128 bits. That is, a total of 384 input bits are compressed together to 128 output bits.
The mixing is done in such a way that full avalanche effect is achieved. That is, every output bit depends on every input bit.
One-way
A one-way function is a function that is easy to compute but hard to invert. A one-way compression function should have the following properties:- Easy to compute: If you have some input, it is easy to calculate the output.
- Preimage-resistance: If an attacker only knows the output it should be unfeasible to calculate an input. In other words, given an output h, it should be unfeasible to calculate an input m such that hash=h.
- Second preimage-resistance: Given an input m1 whose output is h, it should be unfeasible to find another input m2 that has the same output h i.e..
- Collision-resistance: It should be hard to find any two different inputs that compress to the same output i.e. an attacker should not be able to find a pair of messages m1 ≠ m2 such that hash = hash. Due to the birthday paradox there is a 50% chance a collision can be found in time of about 2n/2 where n is the number of bits in the hash function's output. An attack on the hash function thus should not be able to find a collision with less than about 2n/2 work.
The Merkle–Damgård construction
A common use of one-way compression functions is in the Merkle–Damgård construction inside cryptographic hash functions. Most widely used hash functions, including MD5, SHA-1 and SHA-2 use this construction.A hash function must be able to process an arbitrary-length message into a fixed-length output. This can be achieved by breaking the input up into a series of equal-sized blocks, and operating on them in sequence using a one-way compression function. The compression function can either be specially designed for hashing or be built from a block cipher.
The last block processed should also be length padded, this is crucial to the security of this construction. This construction is called the Merkle–Damgård construction. Most widely used hash functions, including SHA-1 and MD5, take this form.
When length padding is applied, attacks cannot find collisions faster than the birthday paradox if the used f-function is collision-resistant. Hence, the Merkle–Damgård hash construction reduces the problem of finding a proper hash function to finding a proper compression function.
A second preimage attack = hash) can be done according to Kelsey and Schneier for a 2k-message-block message in time k × 2n/2+1 + 2n−k+1. Note that the complexity of this attack reaches a minimum of 23n/4+2 for long messages when k = 2n/4 and approaches 2n when messages are short.
Construction from block ciphers
One-way compression functions are often built from block ciphers.Block ciphers take two fixed size inputs and return one single output which is the same size as the input plaintext.
However, modern block ciphers are only partially one-way. That is, given a plaintext and a ciphertext it is infeasible to find a key that encrypts the plaintext to the ciphertext. But, given a ciphertext and a key a matching plaintext can be found simply by using the block cipher's decryption function. Thus, to turn a block cipher into a one-way compression function some extra operations have to be added.
Some methods to turn any normal block cipher into a one-way compression function are Davies–Meyer, Matyas–Meyer–Oseas, Miyaguchi–Preneel and MDC-2, MDC-4, Hirose.
Single-block-length compression functions output the same number of bits as processed by the underlying block cipher. Consequently, double-block-length compression functions output twice the number of bits.
If a block cipher has a block size of say 128 bits single-block-length methods create a hash function that has the block size of 128 bits and produces a hash of 128 bits. Double-block-length methods make hashes with double the hash size compared to the block size of the block cipher used. So a 128-bit block cipher can be turned into a 256-bit hash function.
These methods are then used inside the Merkle–Damgård construction to build the actual hash function. These methods are described in detail further down.
Using a block cipher to build the one-way compression function for a hash function is usually somewhat slower than using a specially designed one-way compression function in the hash function. This is because all known secure constructions do the key scheduling for each block of the message. Black, Cochran and Shrimpton have shown that it is impossible to construct a one-way compression function that makes only one call to a block cipher with a fixed key. In practice reasonable speeds are achieved provided the key scheduling of the selected block cipher is not a too heavy operation.
But, in some cases it is easier because a single implementation of a block cipher can be used for both block cipher and a hash function. It can also save code space in very tiny embedded systems like for instance smart cards or nodes in cars or other machines.
Therefore, the hash-rate or rate gives a glimpse of the efficiency of a hash function based on a certain compression function. The rate of an iterated hash function outlines the ratio between the number of block cipher operations and the output. More precisely, if n denotes the output bit-length of the block cipher the rate represents the ratio between the number of processed bits of input m, n output bits and the necessary block cipher operations s to produce these n output bits. Generally, the usage of fewer block cipher operations could result in a better overall performance of the entire hash function but it also leads to a smaller hash-value which could be undesirable. The rate is expressed by the formula.
The hash function can only be considered secure if at least the following conditions are met:
- The block cipher has no special properties that distinguish it from ideal ciphers, such as for example weak keys or keys that lead to identical or related encryptions.
- The resulting hash size is big enough. According to the birthday attack a security level of 280 is desirable thus the hash size should be at least 160 bits.
- The last block is properly length padded prior to the hashing. Length padding is normally implemented and handled internally in specialised hash functions like SHA-1 etc.
Davies–Meyer
The Davies–Meyer single-block-length compression function feeds each block of the message as the key to a block cipher. It feeds the previous hash value as the plaintext to be encrypted. The output ciphertext is then also XORed with the previous hash value to produce the next hash value. In the first round when there is no previous hash value it uses a constant pre-specified initial value.In mathematical notation Davies–Meyer can be described as:
The scheme has the rate :
If the block cipher uses for instance 256-bit keys then each message block is a 256-bit chunk of the message. If the same block cipher uses a block size of 128 bits then the input and output hash values in each round is 128 bits.
Variations of this method replace XOR with any other group operation, such as addition on 32-bit unsigned integers.
A notable property of the Davies–Meyer construction is that even if the underlying block cipher is totally secure, it is possible to compute fixed points for the construction: for any, one can find a value of such that : one just has to set. This is a property that random functions certainly do not have. So far, no practical attack has been based on this property, but one should be aware of this "feature". The fixed-points can be used in a second preimage attack = hash) of Kelsey and Schneier for a 2k-message-block message in time 3 × 2n/2+1+2n-k+1. If the construction does not allow easy creation of fixed points then this attack can be done in k × 2n/2+1+2n-k+1 time. Note that in both cases the complexity is above 2n/2 but below 2n when messages are long and that when messages get shorter the complexity of the attack approaches 2n.
The security of the Davies–Meyer construction in the Ideal Cipher Model was first proved by R. Winternitz.
Matyas–Meyer–Oseas
The Matyas–Meyer–Oseas single-block-length one-way compression function can be considered the dual of Davies–Meyer.It feeds each block of the message as the plaintext to be encrypted. The output ciphertext is then also XORed with the same message block to produce the next hash value. The previous hash value is fed as the key to the block cipher. In the first round when there is no previous hash value it uses a constant pre-specified initial value.
If the block cipher has different block and key sizes the hash value will have the wrong size for use as the key. The cipher might also have other special requirements on the key. Then the hash value is first fed through the function g to be converted/padded to fit as key for the cipher.
In mathematical notation Matyas–Meyer–Oseas can be described as:
The scheme has the rate:
A second preimage attack = hash) can be done according to Kelsey and Schneier for a 2k-message-block message in time k × 2n/2+1+2n-k+1. Note that the complexity is above 2n/2 but below 2n when messages are long and that when messages get shorter the complexity of the attack approaches 2n.
Miyaguchi–Preneel
The Miyaguchi–Preneel single-block-length one-way compression function is an extended variant of Matyas–Meyer–Oseas. It was independently proposed by Shoji Miyaguchi and Bart Preneel.It feeds each block of the message as the plaintext to be encrypted. The output ciphertext is then XORed with the same message block and then also XORed with the previous hash value to produce the next hash value. The previous hash value is fed as the key to the block cipher. In the first round when there is no previous hash value it uses a constant pre-specified initial value.
If the block cipher has different block and key sizes the hash value will have the wrong size for use as the key. The cipher might also have other special requirements on the key. Then the hash value is first fed through the function g to be converted/padded to fit as key for the cipher.
In mathematical notation Miyaguchi–Preneel can be described as:
The scheme has the rate:
The roles of mi and Hi-1 may be switched, so that Hi-1 is encrypted under the key mi. Thus making this method an extension of Davies–Meyer instead.
A second preimage attack = hash) can be done according to Kelsey and Schneier for a 2k-message-block message in time k × 2n/2+1+2n-k+1. Note that the complexity is above 2n/2 but below 2n when messages are long and that when messages get shorter the complexity of the attack approaches 2n.
Hirose
The Hirose double-block-length one-way compression function consists of a block cipher plus a permutation p. It was proposed by Shoichi Hirose in 2006 and is based on a work by Mridul Nandi.It uses a block cipher whose key length k is larger than the block length n, and produces a hash of size 2n. For example, any of the AES candidates with a 192- or 256-bit key.
Each round accepts a portion of the message mi that is k−n bits long, and uses it to update two n-bit state values G and H.
First, mi is concatenated with Hi−1 to produce a key Ki. Then the two feedback values are updated according to:
- Gi = EKi ⊕ Gi−1
- Hi = EKi ⊕ p.
- p = x ⊕ c
Each encryption resembles the standard Davies–Meyer construction. The advantage of this scheme over other proposed double-block-length schemes is that both encryptions use the same key, and thus key scheduling effort may be shared.
The final output is Ht||Gt. The scheme has the rate RHirose = /2n relative to encrypting the message with the cipher.
Hirose also provides a proof in the Ideal Cipher Model.