sage bit is encrypted independently (there is no diffusion). Applying a bitwise permutation after the

cipher function would repair the lack of diffusion, but would not defend against linear cryptanalysis

in this case.

Step 1. After one round of encryption, the unknown message block is (L1 , R1 ), where L1 is known

plaintext and R1 = L0 • R0 Y1 • R0 Y1 Z1 • Z1 . In particular, R1 is a four-input nonlinear function of

the form A • B • BCD • CD. In fact, this nonlinear function is 0-1 balanced. The Walsh transform

reveals eight linear approximations:

Prob=10/16 : A • 1,

Prob=10/16 : A • D,

Prob=10/16 : A • C,

Prob=10/16 : A • C • D • 1,

Prob=14/16 : A • B,

Prob=10/16 : A • B • D,

Prob=10/16 : A • B • C,

Prob=10/16 : A • B • C • D • 1.

14

Thus, the best linear approximation is simply

A • B • BCD • CD ≈ A • B,

which holds with probability 14/16 = 0.875.

Step 2. By matching the terms of the round function (equation 18) to the expression A • B •

BCD • CD and substituting in the suggested linear approximation A • B, we obtain the round linear

approximation for the -th output bit,

Li [ ] = Ri’1 [ ],

Ri [ ] = Li’1 [ ] • Zi [ ],

This one-round linear equation matches the output of the actual nonlinear round function with a

probability of 0.875 (i.e., the same as the linear approximation).

It is important to note that the probability of this linear approximation applies only to one bit,

not the entire block. For this reason the position of the bits in a linear approximation are usually also

recorded in the equation. This is necessary if different nonlinear functions are computed on sublocks,

as in the DES S-boxes, or if bitwise permutations are used in the cipher. We will omit the notation,

since the following analysis only applies to bits in the same position within the block.

Step 3. If we had more than one linear equation for the round, this step would involve adding one-

round linear equations so that the output message bits of one equation become the input message

bits of some later round, so that they will be cancelled out eventually. The resulting equation would

represent the Feistel cipher (holding probabilistically, of course) and express a linear relation between

some plaintext, key and ciphertext bits. As we only have one linear equation this step simply involves

expanding the Feistel cipher in terms of the one-round equation. Once again, the plaintext is P =

(L0 , R0 ) and the ciphertext is C = (R4 , L4 ). By compounding one round linear approximations we

obtain:

L0 • R0 • L4 • R4 = Z1 • Z2 • Z3 • Z4 . (19)

The probability that this equation holds for a random known plaintext depends on the probabilities

of each linear approximation from which it is composed. The cipher linear approximation will hold

if and only if either all of the one-round linear approximations hold, or if none of them hold. In this

case the probability of equation 19 is calculated by

(0.875)(0.875) + (1 ’ 0.875)(1 ’ 0.875) = 0.781

’ (0.781)(0.875) + (1 ’ 0.781)(1 ’ 0.875) = 0.711

’ (0.711)(0.875) + (1 ’ 0.711)(1 ’ 0.875) = 0.658.

The method of obtaining the probability of the cipher linear approximation from the probabilities

of the one-round linear approximations in this manner is due to the piling-up lemma described by

Matsui [10]. Equivalently, the probability of the cipher linear approximation may be calculated by

n

1 1

pi ’

+ 2n’1 ,

2 2

i=1

which evaluates the probability that the exclusive-or of n independent random variables will equal

zero. The piling-up lemma cannot be used blindly; it fails to give an accurate probability estimate on

some block ciphers, particularly if keyed permutations or data-dependent permutations are used.

Step 4. The success of an attack depends on the probability of the cipher linear approximation

and also on the number of known plaintexts. Matsui estimates the success rate of the algorithm and

15

the number of plaintexts required to acheive a particular degree of success. The success rate is based

on a maximum likelihood method and is derived by approximating binary distribution with normal

distribution [10]. For a 97.7% success rate, the equation N = |p’ 1 |’2 is suggested. Thus, for a success

2

’2

rate of 97.7%, we would require |0.658 ’ 0.500| = 40 known plaintexts. If the number of rounds is

increased from four to eight then the probability for the cipher linear approximation is 0.550 and

around 400 known plaintexts are required for a 97.7% success rate.

To perform the known-plaintext attack on this 4-round Feistel cipher, obtain N = 40 known plain-

texts (encrypted under an unknown key). Let X = 0 then, for each known plaintext, evaluate the

one-bit value of the left-hand side (LHS) of the cipher linear approximation (equation 19) and incre-

ment the counter X by one if the LHS has value zero. If X > N/2 then guess that the value of the

right-hand side of equation 19 is zero, otherwise guess that it is one. This gives us the one-bit solution

to Z1 • Z2 • Z3 • Z4 , which would be used to ¬x the value of a key bit in an exhastive key-search (thus

halving the complexity of exhaustive key-search).

This is the basic known-plaintext attack; there are improvements over this, which attempt recover

a larger portion of the key at a time and also require less known plaintext [11].

The nonlinear component in STEA (and TEA) is the integer addition operation ( ). The non-

linearity in integer addition is due to the carry function, as shown in equation 14. The function

S[i] = (A B)[i] (i.e., the algebraic normal form of S[i] where S = A B) becomes more com-

plex as i increases, because of the fact that carry propagates upwards. The following table shows the

operation of the carry function:

A[i] B[i] C[i] C[i + 1]

0 0 0 0

0 0 1 0

0 1 0 0

0 1 1 1

1 0 0 0

1 0 1 1

1 1 0 1

1 1 1 1

The Walsh transform reveals four linear approximations to the carry function, each holding with

probability 6/8 = 0.750:

Prob=6/8 : C[i],

Prob=6/8 : B[i],

Prob=6/8 : A[i],

Prob=6/8 : A[i] • B[i] • C[i] • 1.

This gives us two useful linear approximations for one carry operation, C[i] = A[i ’ 1] and C[i] =

B[i ’ 1], both holding with probability 0.750. The corresponding one-round linear approximations

are

Ri [ ] = Li’1 [ ’ 1], (20)

Ri [ ] = Ri’1 [ ’ 1] • Ki [ ’ 1], (21)

To ¬nd the linear approximation for the 64-round cipher, these equations must be extended to a

suitable linear path from the plaintext to the ciphertext. The optimal construction for a linear path in

16

STEA would be to use equation 20 for all the odd rounds and equation 21 for all the even rounds. The

result is the STEA linear approximation

L64 [32] • R64 [32] = K1 [31],

which is invalid since bit 32 does not exist for 32-bit blocks. Linear cryptanalysis is ineffective on the

STEA cipher, because there is no linear approximation which spans 64 rounds. In general, the best

linear approximation in n-round STEA is given by

Ln [n/2] • Rn [n/2] = K1 [(n/2) ’ 1].

Hence, for 16-round STEA, we have the cipher linear approximation

L16 [8] • R16 [8] = K1 [7].

which holds with probability 1 + 2’17 . To recover one key bit (bit 7 of K1 ) of 16-round STEA using

2

linear cryptanalysis would require around 234 known plaintexts.

9 Weak Key Schedule And Weak Cipher Function

In this section, we show the importance of having a key schedule that generates distinct round sub-

keys. First we will follow on from the analysis in section 5 and describe a curious property of the STEA

cipher.

If the least-signi¬cant bits of the subkeys and plaintext halves are all equal, then these bits will

“¬lter through” to the ciphertext. For example,

P = (xxxxxyyy xxyyyyyyH ),

K = (xxxxyyyy xxxxxxyyH ),

C = (xxxxxyyy xxxxxxyyH ),

where ˜x™ is any digit, and ˜y™ is either the bit-sequence of all ones (FH ) or zero (0H ). This should be

obvious in the case where y=0, since addition and exclusive-or also results in 0, and no carry.

Also, if K1 equals the right-half of the plaintext and K2 equals the left-half of the plaintext, then

the plaintext is not encrypted. So if an attacker has a large number of ciphertext blocks, and one block

looks suspiciously like plaintext, then it could well be the key.

This problem is caused by the output of the cipher function (equation 10) being zero for all rounds.

The output of the cipher function is zero if and only if the input message block is equal to the subkey.

Since the message block cannot be encrypted if the cipher function output is zero it is crucial that the

next subkey that affects the message block be different to ensure that the cipher function output is

non-zero (and cause the message block to be partially-encrypted, as required).

In an IBM research report, Edna Grossman and Bryant Tuckerman described a number of attacks

on a weak Feistel cipher called NDS (New Data Seal), which appears to be a simpli¬ed version of the

Lucifer block cipher [1, 4]. The NDS cipher had a weak key schedule and a weak cipher function.

The key schedule was practically non-existant; all the round subkeys were the same, which effectively

meant that in each round the entire key is input in the same form (i.e., unmodi¬ed between rounds).

The cipher function was weak for two reasons: it was usually possible to derive key bits from knowl-

edge of an input block and (encrypted) output block of the cipher function; furthermore, it would be

possible to predict the output of the cipher function for certain inputs, regardless of the key. Conse-

quently, NDS appeared highly susceptible to an attack which, in its most advanced form, could break

17

NDS with 556 chosen plaintexts. However, the principle of their attack is fairly general and would be

applicable to any Feistel cipher that does not use distinct round subkeys and whose cipher function

may allow recovery of key bits from known input and output blocks. An interesting point about this

attack is that its effectiveness does not depend on the number of rounds.

The basic principle of their attack hinges on the fact that the encryption process of an n-round

Feistel cipher with no distinct round subkeys can be simpli¬ed to E(P, K) = π(ρn (P, K)). In the

following explanation, we will omit the output transformation (π) for simplicity, although the analysis

is still applicable to Feistel ciphers with the ¬nal swap. It is fairly easy to extend the proofs to take

account of the ¬nal swap.

We obtain a known plaintext (P, C), where C = ρn (P, K), and predict the result of a one-round

encryption of P ; we denote this by P so that P = ρ(P, K). As we are dealing with a Feistel cipher,

we know that P = (L0 , R0 ) and P = (L1 , R1 ), where only the right half of P is unknown and has to

be predicted. Predicting the result of a one-round encryption of NDS was easy due to its weak cipher

function. In fact, the NDS cipher function is considerably weaker than the STEA cipher function (it

is impossible to predict the output of the STEA cipher function if the round subkey is unknown). If

it is not possible to predict the output of the cipher function and the message block length is small

enough (e.g., m = 64) then we could try each possible value of R1 until we ¬nd the right value; the

attack would require at most 2m/2 chosen-plaintext encryptions.

We obtain the chosen-plaintext encryption of P to get C = ρn (P , K). Notice that C is the (n+1)-

round encryption of P , i.e., C = ρ(C, K):

= ρn (P , K)

C

= ρn (ρ(P, K), K)

= ρn+1 (P, K)

= ρ(ρn (P, K), K)

= ρ(C, K).

In particular, for a correct guess of P = ρ(P, K), the right half of C should equal the left half of C

since C = (Ln , Rn ) and C = (Ln+1 , Rn+1 ). Once we have found the right value of P = ρ(P, K) then

we can attempt to recover the key K from the equations:

L0 • R1 = F (R0 , K),

Ln • Rn+1 = F (Rn , K).

Now the problem is to recover the key given the input and output of the cipher function. As we have

two such equations, we can use one equation to determine the key and then verify the key that we

recovered by testing it in the second equation. If the cipher function is weak then recovering the key

given the cipher function input and output should be relatively simple (as in the case of NDS or STEA).

A variant of this attack can be applied to STEA with a higher computational cost. STEA does not

use the same subkey in every round, but it does use the same subkey in alternate rounds. Let K1 and

K2 be the 32-bit halves of the 64-bit key K. We can express the repetitive nature of STEA by letting R

denote two rounds:

R(M, K) = ρ(ρ(M, K1 ), K2 ).

Hence 64-round STEA can be represented as 32 successive applications of the R function. In particu-

lar, the key does not change between applications of R so that E(P, K) = R32 (P, K). We now have to

be able to predict the output of R for a chosen plaintext. The STEA cipher function does not allow us

to do this; we must know the value of the subkey to predict the output of one round. The only way to

18

deal with this situation is to obtain enough well-distributed known plaintexts that we have a reason-

able chance of ¬nding a pair of plaintexts (P, P ) such that P = R(P, K). By the birthday paradox,

this would require around 232 known plaintexts. Since we cannot identify a correct pair (P, P ), we

must attempt the following analysis on every possible pair until we recover the key. Therefore, the

worst-case time complexity of this attack would be O(264 ). This is comparable to the complexity of

an exhaustive key-search.

For each pair of known plaintexts (P, P ) we solve the following equations

= R0 • (L2

K1 L0 ),

= L2 • (R2

K2 R0 ),

= R64 • (L66

K1 L64 ),

= L66 • (R66

K2 R64 ),

where P = (L0 , R0 ), P = (L2 , R2 ), C = (L64 , R64 ) and C = (L66 , R66 ). If the two values of K1 match

and the two values of K2 match then we have found the correct key. Once again, this attack will work

regardless of the number of rounds and may still be effective if the cipher function was slightly more

complex (e.g., if the output of the cipher function was subjected to a bitwise rotation, which would

render the divide-and-conquer attack ineffective).

A similar attack can be attempted on a version of TEA with ¬xed subkey constants δi (e.g., δi = δ,

for all i); this version of TEA would have two distinct subkeys (like STEA). The attack on this TEA

variant was described by Roger Fleming [3]. It has a complexity of O(298 ) and requires around 232

known plaintexts. Here the complexity of attack is better than that of exhaustive key-search (which

was not the case with STEA). An improvement of the attack, which reduces the complexity to O(264 )

at the cost of requiring an additional 232 chosen plaintexts, was outlined by David Wagner.

10 Differential Cryptanalysis

In section 2 it was shown that the dif¬culty of solving nonlinear equations can be useful for designing