get by simple manipulation of this cipher was eliminating a subkey by taking the difference of the

ciphertexts, arriving at equation 4. Now notice that if the cipher was linear, taking the “difference” of

a pair of ciphertexts would have cancelled out both subkeys, leaving the equation:

C •C = P •P .

This simply tells us that the difference between the ciphertexts is the same as the difference between

their plaintexts; and as this holds for all messages, taking the difference does not lead to any informa-

tion about the key.

For a nonlinear cipher, the difference between ciphertexts is not the same as the difference be-

tween their plaintexts for all possible plaintexts. Furthermore, for a speci¬c difference in the plain-

texts, the difference in the corresponding ciphertexts may depend on the key and hence reveal in-

formation about the key. This idea is the principle behind differential cryptanalysis, a general block

cipher analysis method developed by Eli Biham and Adi Shamir [2]. A successful application of differ-

ential cryptanalysis leads to a chosen-plaintext attack.

We denote the difference between a pair of messages (M, M ) by ∆M = M —(M )’1 , where — is an

appropriate operation for de¬ning “difference”. The exclusive-or operation, by its nature, is the most

versatile choice for de¬ning difference. However, it may be preferable or necessary to use another

group operation for this purpose; for example, when applying differential cryptanalysis to an iterated

19

cipher, it is preferable to use the difference operation which will cause the cipher to be a “Markov

cipher” [8, 9].

The example block cipher of equation 3 is not an iterated block cipher, but it is suitable for demon-

strating the basic ideas of differential cryptanalysis. We assume the message block length m = 32 and

the key length k = 64 (i.e., half the block lengths of the original description).

Letting difference be de¬ned by subtraction modulo 232 , the following equation gives the appro-

priate ciphertext difference:

∆C =C C

= (P • K0 ) (P • K0 ).

Hence, an attacker who knows the values P , P and ∆C may be able to recover K0 if they can ef¬-

ciently solve equations of the form

C = (A • X) (B • X),

where A, B, C are known constants. So let us examine this type of equation in more detail, to deter-

mine precisely how much information we may be able to recover. First we express the equation in a

simpler form:

= (A • X) (B • X)

C

= (A • X) (A • X • D)

(Y • D),

=Y

where D = A • B and Y = A • X. The goal is to recover Y . The subtraction modulo 232 can be

replaced by addition modulo 232 with a little more manipulation:

(Y • D)

C =Y

’(Y • D)

=Y

(Y • D • FFFFFFFFH )

=Y 1,

(Y • F ),

E =Y

where F = D • FFFFFFFFH (the bitwise complement of D) and E = C 1. Once again, Y is unknown

(and its value would lead to the subkey), and E and F are known. In a chosen-plaintext attack, the

attacker would be able to ¬x values of F by careful selection of plaintexts A and B. If F = 0 then

we would have E = Y Y = Y < 1, effectively revealing 31 of the 32 bits of Y . To set F = 0, the

<

exclusive-or difference between plaintexts must be (FFFFFFFFH ); it does not matter what the actual

plaintext values are.

Consider a chosen-plaintext attack on this block cipher. The attacker chooses a random plaintext

P , then calculates a second plaintext P = P • ∆P (so that the exclusive-or difference between plain-

texts is ∆P ). Both chosen plaintexts are encrypted under the unknown key. The least-signi¬cant 31

bits of subkey K0 are then recovered from the values of P and ∆C = C C by

1) > 1) • P.

K0 = ((∆C >

The most-signi¬cant bit of the recovered K0 will be ˜0™. The subkey K1 is then evaluated, using either

one of the two chosen plaintexts, by K1 = C (P • K0 ). Hence, the 64-bit key (K0 , K1 ) is easily

recovered with one known plaintext and one adaptive chosen plaintext (or two chosen plaintexts)5 .

5 Ifthe most-signi¬cant bit of the original subkey K0 was equal to ˜1™ (and hence incorrect in the recovered K0 ) then the

recovered K1 will also be incorrect in the most-signi¬cant bit. The result is that the 64-bit key (K0 , K1 ) recovered by this

method may not be equal to the original key, but it will certainly be an equivalent key.

20

In this example, the operation for de¬ning difference was different for the plaintext (exclusive-or)

and ciphertext (subtraction modulo 232 ). The approach to analysing iterated ciphers requires that the

difference operation be the same for input and output. In the remainder of this report, we will de¬ne

the difference operation by exclusive-or.

Now we will attempt differential cryptanalysis on the block cipher de¬ned by

K0 ) • K1 .

C = (P

This cipher differs from the block cipher of equation 3 in that the addition and exclusive-or opera-

tions have swapped places. The difference operation is de¬ned by exclusive-or, and so we have the

following equations for the plaintext and ciphertext differences:

= P •P ,

∆P

= C •C

∆C

K0 ) • (P

= (P K0 ).

The subkey K1 is eliminated from the ciphertext difference ∆C. We need to choose values of P and

P , with a speci¬c difference ∆P , that will reveal some information about K0 by the difference ∆C.

To start, note that if ∆P = 0 then ∆C = 0; we have no useful information. Also note that this

can be generalised to the case that a sequence of ˜0™ least-signi¬cant bits in the input difference will

cause the same least-signi¬cant bits to be ˜0™ in the output difference, e.g., if ∆P = xxxx0000H then

∆C = xxxx0000H . This is because the nonlinear carry only propagates upwards and, once again, we

will exploit this fact.

The Hamming weight of an integer is the number of ones in its binary representation. We now

look at what happens if the least-signi¬cant bit of the input difference is non-zero. Note that since

lower bits of input difference zero do not affect higher output difference bits, the analysis holds for

any input difference where precisely one bit is set (i.e., any input difference of Hamming weight 1). If

the input difference ∆P = 1 then we would have

K0 ) • ((P • 1)

∆C = (P K0 ).

From this we see that the least-signi¬cant bit of K0 is added to each of ˜0™ and ˜1™ and then the

exclusive-or difference is taken; the value of the least-signi¬cant bit of P has no effect on this cal-

culation. The LSB of ∆C will have value ˜1™ since it is linear. If the LSB of K0 = 0 then no carry will

be generated by either of the additions and so none of the higher bits of the output difference will be

affected (they will have value zero). In this case, ∆P = ∆C. If the LSB of K1 = 1 then the calculation

of (1 0) • (1 1) will generate a carry, which will affect higher bits. We expect that, in this case,

∆P = ∆C. So we can identify the value of the least-signi¬cant bit of K0 by using an input difference

of 1 and checking if the output difference is equal to the input difference. If the output difference is

the same as the input difference then assume that the LSB of K0 is ˜0™ otherwise assume it is ˜1™.

We can recover an arbitrary bit of K0 by the same technique; use an input difference of Hamming

weight 1 (set the bit in the i-th position) and check the Hamming weight of the output difference;

if it has Hamming weight 1 then assume that the i-th bit of K0 is ˜0™ otherwise assume it is ˜1™. Pro-

ceeding like this, we can recover the (m ’ 1) least-signi¬cant bits of K0 . We cannot recover the value

of the most-signi¬cant bit of K0 in this analysis since we lose the carry anyway. However, this loss

is compensated by the fact that we automatically recover this bit when calculating the value of K1

from a known plaintext (we will either recover the original key or an equivalent key). Hence, for an

m-bit message block size (and 2m-bit key), this simple block cipher cipher is broken by differential

cryptanalysis with one known plaintext and m chosen plaintexts.

21

It is possible (but much more complicated) to extend this attack to the block cipher of equation 3

de¬ning input difference and output difference by exclusive-or. We would have output difference

= ((P • K0 ) K1 ) • ((P • K0 )

∆C K1 ).

Imagine that we supply an input difference of Hamming weight 1, with the i-th bit set. The compli-

cation arises because we cannot guarantee that the input bits to the addition that are lower than the

i-th bit have value zero, because some of them may have been inverted by the exclusive-or with K0 . In

particular, if the (i’1)-th bit of K0 is ˜1™ and the (i’1)-th bit of K1 is also ˜1™ then the subsequent carry

that would be generated by their addition would disturb the higher bits, and hence the analysis. This

problem can be dealt with by some elaborate analysis, but there is little point in doing so since the

result cannot be generalised (which would be the principal reason for using exclusive-or for de¬ning

input and output difference in this case).

There are two stages to applying differential cryptanalysis to an iterated cipher (cf. linear crypt-

analysis): (1) ¬nding suitable differentials of the cipher, and (2) applying the chosen-plaintext attack

algorithm.

We will now apply differential cryptanalysis to a 6-round block cipher with message block length

m = 8 and key length k = 48. Let the nonlinear and invertible function F : F8 ’ F8 be de¬ned by

2 2

= ((x • 5CH )

F (x) 89H ) > 6.

>>

We know that for a linear function the difference between any pair of inputs will be equal to the differ-

ence between their corresponding outputs. As the function F is nonlinear, the output difference will

not equal the input difference for all possible inputs. In fact, for many speci¬c input differences there

are a number of possible output differences that could occur. For example, let the difference of inputs

X and X be denoted by ∆X = X • X and the difference of their corresponding outputs Y and Y be

∆Y = F (X) • F (X ). We ¬nd that there are 7 possible output differences that may occur if the input

difference to F is 01H , as follows (the differences are given in binary for ease of interpretation):

• There are 64 pairs (X, X ) such that ∆X = 00000001 and ∆Y = 00001100.

• There are 32 pairs (X, X ) such that ∆X = 00000001 and ∆Y = 00011100.

• There are 16 pairs (X, X ) such that ∆X = 00000001 and ∆Y = 00111100.

• There are 8 pairs (X, X ) such that ∆X = 00000001 and ∆Y = 01111100.

• There are 4 pairs (X, X ) such that ∆X = 00000001 and ∆Y = 11111100.

• There are 2 pairs (X, X ) such that ∆X = 00000001 and ∆Y = 11111101.

• There are 2 pairs (X, X ) such that ∆X = 00000001 and ∆Y = 11111111.

A pair of input and output differences to a function form a characteristic of the function. Thus, the

function F has a characteristic (01H , 1CH ) that holds with probability 32/128 = 0.250. There are two

characteristics of F that hold with probability 1.000:

• There are 128 pairs (X, X ) such that ∆X = 00000000 and ∆Y = 00000000.

• There are 128 pairs (X, X ) such that ∆X = 10000000 and ∆Y = 00000010.

2n’1 over Z2n

Of these, only the latter characteristic is useful and it is due to the equivalence of X

and X • 2n’1 over Fn .

2

22

The block cipher iterates the round function de¬ned by

= F (Ti’1 • Ki ), (for 1 ¤ i ¤ 6),

Ti

where T0 is the plaintext and T6 is the ciphertext (this is not a Feistel cipher). The subkeys (K1 , . . . , K6 )

are independent, and since each subkey is of length 8 bits, the key length k is 48 bits.

When applying differential cryptanalysis to iterated ciphers, the attacker analyses the propagation

of differences between plaintexts through the block cipher and attempts to ¬nd a series of differences

between rounds which hold with high probability. This is done by compounding one-round charac-

teristics so that the output difference to one round is the input difference to the next round. The chain

of differences for i rounds of a cipher forms an i-round characteristic. For an n-round block cipher, the

attacker hopes to ¬nd an n-round characteristic that holds with high probability. The characteristic

that has the optimal probability is called the best characteristic.

Note that for inputs (X, X ) and corresponding outputs (Y, Y ) the one-round input difference is

∆X = X • X and the output difference is ∆Y = Y • Y . Also,

= F (X • K) • F (X • K)

∆Y

= F (X • K) • F (X • K • ∆X)

= F (Z) • F (Z • ∆X),

where Z = X • K is unknown. This shows that we are not interested in the actual value of the inputs;

only in their difference. It also puts the earlier analysis of the function F into context of the cipher,

since the subkey is “absorbed” into the input of the function F and we need not concern ourselves

with its value for the analysis. We are implicitly assuming that the value of the plaintext and subkeys

will not signi¬cantly affect the propagation of differences through the cipher (this is the hypothesis of

stochastic equivalence [8, 9]).

We know that the best one-round characteristic is (80H , 02H ), which holds with probability 1.000.

We shall use this as a starting point to explore the possible differences that may be reached after 6

rounds. We will choose to follow the characteristic that has the highest immediate probability. This

may not necessarily lead to the best 6-round characteristic; we would have to follow each possible

path of difference (including the low-probability differences) or else use some heuristic technique to

¬nd the best characteristic.

(1) “ There are 128 pairs (X, X ) such that ∆X = 10000000 and ∆Y = 00000010.

(2) “ There are 64 pairs (X, X ) such that ∆X = 00000010 and ∆Y = 00001000.

“ There are 32 pairs (X, X ) such that ∆X = 00000010 and ∆Y = 00011000.

“ There are 16 pairs (X, X ) such that ∆X = 00000010 and ∆Y = 00111000.

“ There are 8 pairs (X, X ) such that ∆X = 00000010 and ∆Y = 01111000.

“ There are 4 pairs (X, X ) such that ∆X = 00000010 and ∆Y = 11111000.

“ There are 2 pairs (X, X ) such that ∆X = 00000010 and ∆Y = 11111001.

“ There are 2 pairs (X, X ) such that ∆X = 00000010 and ∆Y = 11111011.

(3) “ There are 16 pairs (X, X ) such that ∆X = 00001000 and ∆Y = 00100000.

“ There are 56 pairs (X, X ) such that ∆X = 00001000 and ∆Y = 01100000.

“ There are 28 pairs (X, X ) such that ∆X = 00001000 and ∆Y = 11100000.

“ There are 14 pairs (X, X ) such that ∆X = 00001000 and ∆Y = 11100001.

23

“ There are 14 pairs (X, X ) such that ∆X = 00001000 and ∆Y = 11100011.

(4) “ There are 18 pairs (X, X ) such that ∆X = 01100000 and ∆Y = 10000000.

“ There are 92 pairs (X, X ) such that ∆X = 01100000 and ∆Y = 10000001.

“ There are 18 pairs (X, X ) such that ∆X = 01100000 and ∆Y = 10000010.

(5) “ There are 64 pairs (X, X ) such that ∆X = 10000001 and ∆Y = 00001110.

“ There are 32 pairs (X, X ) such that ∆X = 10000001 and ∆Y = 00011110.

“ There are 16 pairs (X, X ) such that ∆X = 10000001 and ∆Y = 00111110.