. 4
( 6 .)


divide-and-conquer approach since each ciphertext bit depends on one plaintext bit and each mes-
sage bit is encrypted independently (there is no diffusion). Applying a bitwise permutation after the
cipher function would repair the lack of diffusion, but would not defend against linear cryptanalysis
in this case.
Step 1. After one round of encryption, the unknown message block is (L1 , R1 ), where L1 is known
plaintext and R1 = L0 • R0 Y1 • R0 Y1 Z1 • Z1 . In particular, R1 is a four-input nonlinear function of
the form A • B • BCD • CD. In fact, this nonlinear function is 0-1 balanced. The Walsh transform
reveals eight linear approximations:

Prob=10/16 : A • 1,
Prob=10/16 : A • D,
Prob=10/16 : A • C,
Prob=10/16 : A • C • D • 1,
Prob=14/16 : A • B,
Prob=10/16 : A • B • D,
Prob=10/16 : A • B • C,
Prob=10/16 : A • B • C • D • 1.

Thus, the best linear approximation is simply

A • B • BCD • CD ≈ A • B,

which holds with probability 14/16 = 0.875.
Step 2. By matching the terms of the round function (equation 18) to the expression A • B •
BCD • CD and substituting in the suggested linear approximation A • B, we obtain the round linear
approximation for the -th output bit,

Li [ ] = Ri’1 [ ],
Ri [ ] = Li’1 [ ] • Zi [ ],

This one-round linear equation matches the output of the actual nonlinear round function with a
probability of 0.875 (i.e., the same as the linear approximation).
It is important to note that the probability of this linear approximation applies only to one bit,
not the entire block. For this reason the position of the bits in a linear approximation are usually also
recorded in the equation. This is necessary if different nonlinear functions are computed on sublocks,
as in the DES S-boxes, or if bitwise permutations are used in the cipher. We will omit the notation,
since the following analysis only applies to bits in the same position within the block.
Step 3. If we had more than one linear equation for the round, this step would involve adding one-
round linear equations so that the output message bits of one equation become the input message
bits of some later round, so that they will be cancelled out eventually. The resulting equation would
represent the Feistel cipher (holding probabilistically, of course) and express a linear relation between
some plaintext, key and ciphertext bits. As we only have one linear equation this step simply involves
expanding the Feistel cipher in terms of the one-round equation. Once again, the plaintext is P =
(L0 , R0 ) and the ciphertext is C = (R4 , L4 ). By compounding one round linear approximations we

L0 • R0 • L4 • R4 = Z1 • Z2 • Z3 • Z4 . (19)

The probability that this equation holds for a random known plaintext depends on the probabilities
of each linear approximation from which it is composed. The cipher linear approximation will hold
if and only if either all of the one-round linear approximations hold, or if none of them hold. In this
case the probability of equation 19 is calculated by

(0.875)(0.875) + (1 ’ 0.875)(1 ’ 0.875) = 0.781
’ (0.781)(0.875) + (1 ’ 0.781)(1 ’ 0.875) = 0.711
’ (0.711)(0.875) + (1 ’ 0.711)(1 ’ 0.875) = 0.658.

The method of obtaining the probability of the cipher linear approximation from the probabilities
of the one-round linear approximations in this manner is due to the piling-up lemma described by
Matsui [10]. Equivalently, the probability of the cipher linear approximation may be calculated by
1 1
pi ’
+ 2n’1 ,
2 2

which evaluates the probability that the exclusive-or of n independent random variables will equal
zero. The piling-up lemma cannot be used blindly; it fails to give an accurate probability estimate on
some block ciphers, particularly if keyed permutations or data-dependent permutations are used.
Step 4. The success of an attack depends on the probability of the cipher linear approximation
and also on the number of known plaintexts. Matsui estimates the success rate of the algorithm and

the number of plaintexts required to acheive a particular degree of success. The success rate is based
on a maximum likelihood method and is derived by approximating binary distribution with normal
distribution [10]. For a 97.7% success rate, the equation N = |p’ 1 |’2 is suggested. Thus, for a success
rate of 97.7%, we would require |0.658 ’ 0.500| = 40 known plaintexts. If the number of rounds is
increased from four to eight then the probability for the cipher linear approximation is 0.550 and
around 400 known plaintexts are required for a 97.7% success rate.
To perform the known-plaintext attack on this 4-round Feistel cipher, obtain N = 40 known plain-
texts (encrypted under an unknown key). Let X = 0 then, for each known plaintext, evaluate the
one-bit value of the left-hand side (LHS) of the cipher linear approximation (equation 19) and incre-
ment the counter X by one if the LHS has value zero. If X > N/2 then guess that the value of the
right-hand side of equation 19 is zero, otherwise guess that it is one. This gives us the one-bit solution
to Z1 • Z2 • Z3 • Z4 , which would be used to ¬x the value of a key bit in an exhastive key-search (thus
halving the complexity of exhaustive key-search).
This is the basic known-plaintext attack; there are improvements over this, which attempt recover
a larger portion of the key at a time and also require less known plaintext [11].
The nonlinear component in STEA (and TEA) is the integer addition operation ( ). The non-
linearity in integer addition is due to the carry function, as shown in equation 14. The function
S[i] = (A B)[i] (i.e., the algebraic normal form of S[i] where S = A B) becomes more com-
plex as i increases, because of the fact that carry propagates upwards. The following table shows the
operation of the carry function:

A[i] B[i] C[i] C[i + 1]
0 0 0 0
0 0 1 0
0 1 0 0
0 1 1 1
1 0 0 0
1 0 1 1
1 1 0 1
1 1 1 1

The Walsh transform reveals four linear approximations to the carry function, each holding with
probability 6/8 = 0.750:

Prob=6/8 : C[i],
Prob=6/8 : B[i],
Prob=6/8 : A[i],
Prob=6/8 : A[i] • B[i] • C[i] • 1.

This gives us two useful linear approximations for one carry operation, C[i] = A[i ’ 1] and C[i] =
B[i ’ 1], both holding with probability 0.750. The corresponding one-round linear approximations

Ri [ ] = Li’1 [ ’ 1], (20)
Ri [ ] = Ri’1 [ ’ 1] • Ki [ ’ 1], (21)

To ¬nd the linear approximation for the 64-round cipher, these equations must be extended to a
suitable linear path from the plaintext to the ciphertext. The optimal construction for a linear path in

STEA would be to use equation 20 for all the odd rounds and equation 21 for all the even rounds. The
result is the STEA linear approximation

L64 [32] • R64 [32] = K1 [31],

which is invalid since bit 32 does not exist for 32-bit blocks. Linear cryptanalysis is ineffective on the
STEA cipher, because there is no linear approximation which spans 64 rounds. In general, the best
linear approximation in n-round STEA is given by

Ln [n/2] • Rn [n/2] = K1 [(n/2) ’ 1].

Hence, for 16-round STEA, we have the cipher linear approximation

L16 [8] • R16 [8] = K1 [7].

which holds with probability 1 + 2’17 . To recover one key bit (bit 7 of K1 ) of 16-round STEA using
linear cryptanalysis would require around 234 known plaintexts.

9 Weak Key Schedule And Weak Cipher Function
In this section, we show the importance of having a key schedule that generates distinct round sub-
keys. First we will follow on from the analysis in section 5 and describe a curious property of the STEA
If the least-signi¬cant bits of the subkeys and plaintext halves are all equal, then these bits will
“¬lter through” to the ciphertext. For example,

P = (xxxxxyyy xxyyyyyyH ),
K = (xxxxyyyy xxxxxxyyH ),
C = (xxxxxyyy xxxxxxyyH ),

where ˜x™ is any digit, and ˜y™ is either the bit-sequence of all ones (FH ) or zero (0H ). This should be
obvious in the case where y=0, since addition and exclusive-or also results in 0, and no carry.
Also, if K1 equals the right-half of the plaintext and K2 equals the left-half of the plaintext, then
the plaintext is not encrypted. So if an attacker has a large number of ciphertext blocks, and one block
looks suspiciously like plaintext, then it could well be the key.
This problem is caused by the output of the cipher function (equation 10) being zero for all rounds.
The output of the cipher function is zero if and only if the input message block is equal to the subkey.
Since the message block cannot be encrypted if the cipher function output is zero it is crucial that the
next subkey that affects the message block be different to ensure that the cipher function output is
non-zero (and cause the message block to be partially-encrypted, as required).
In an IBM research report, Edna Grossman and Bryant Tuckerman described a number of attacks
on a weak Feistel cipher called NDS (New Data Seal), which appears to be a simpli¬ed version of the
Lucifer block cipher [1, 4]. The NDS cipher had a weak key schedule and a weak cipher function.
The key schedule was practically non-existant; all the round subkeys were the same, which effectively
meant that in each round the entire key is input in the same form (i.e., unmodi¬ed between rounds).
The cipher function was weak for two reasons: it was usually possible to derive key bits from knowl-
edge of an input block and (encrypted) output block of the cipher function; furthermore, it would be
possible to predict the output of the cipher function for certain inputs, regardless of the key. Conse-
quently, NDS appeared highly susceptible to an attack which, in its most advanced form, could break

NDS with 556 chosen plaintexts. However, the principle of their attack is fairly general and would be
applicable to any Feistel cipher that does not use distinct round subkeys and whose cipher function
may allow recovery of key bits from known input and output blocks. An interesting point about this
attack is that its effectiveness does not depend on the number of rounds.
The basic principle of their attack hinges on the fact that the encryption process of an n-round
Feistel cipher with no distinct round subkeys can be simpli¬ed to E(P, K) = π(ρn (P, K)). In the
following explanation, we will omit the output transformation (π) for simplicity, although the analysis
is still applicable to Feistel ciphers with the ¬nal swap. It is fairly easy to extend the proofs to take
account of the ¬nal swap.
We obtain a known plaintext (P, C), where C = ρn (P, K), and predict the result of a one-round
encryption of P ; we denote this by P so that P = ρ(P, K). As we are dealing with a Feistel cipher,
we know that P = (L0 , R0 ) and P = (L1 , R1 ), where only the right half of P is unknown and has to
be predicted. Predicting the result of a one-round encryption of NDS was easy due to its weak cipher
function. In fact, the NDS cipher function is considerably weaker than the STEA cipher function (it
is impossible to predict the output of the STEA cipher function if the round subkey is unknown). If
it is not possible to predict the output of the cipher function and the message block length is small
enough (e.g., m = 64) then we could try each possible value of R1 until we ¬nd the right value; the
attack would require at most 2m/2 chosen-plaintext encryptions.
We obtain the chosen-plaintext encryption of P to get C = ρn (P , K). Notice that C is the (n+1)-
round encryption of P , i.e., C = ρ(C, K):

= ρn (P , K)
= ρn (ρ(P, K), K)
= ρn+1 (P, K)
= ρ(ρn (P, K), K)
= ρ(C, K).

In particular, for a correct guess of P = ρ(P, K), the right half of C should equal the left half of C
since C = (Ln , Rn ) and C = (Ln+1 , Rn+1 ). Once we have found the right value of P = ρ(P, K) then
we can attempt to recover the key K from the equations:

L0 • R1 = F (R0 , K),
Ln • Rn+1 = F (Rn , K).

Now the problem is to recover the key given the input and output of the cipher function. As we have
two such equations, we can use one equation to determine the key and then verify the key that we
recovered by testing it in the second equation. If the cipher function is weak then recovering the key
given the cipher function input and output should be relatively simple (as in the case of NDS or STEA).
A variant of this attack can be applied to STEA with a higher computational cost. STEA does not
use the same subkey in every round, but it does use the same subkey in alternate rounds. Let K1 and
K2 be the 32-bit halves of the 64-bit key K. We can express the repetitive nature of STEA by letting R
denote two rounds:

R(M, K) = ρ(ρ(M, K1 ), K2 ).

Hence 64-round STEA can be represented as 32 successive applications of the R function. In particu-
lar, the key does not change between applications of R so that E(P, K) = R32 (P, K). We now have to
be able to predict the output of R for a chosen plaintext. The STEA cipher function does not allow us
to do this; we must know the value of the subkey to predict the output of one round. The only way to

deal with this situation is to obtain enough well-distributed known plaintexts that we have a reason-
able chance of ¬nding a pair of plaintexts (P, P ) such that P = R(P, K). By the birthday paradox,
this would require around 232 known plaintexts. Since we cannot identify a correct pair (P, P ), we
must attempt the following analysis on every possible pair until we recover the key. Therefore, the
worst-case time complexity of this attack would be O(264 ). This is comparable to the complexity of
an exhaustive key-search.
For each pair of known plaintexts (P, P ) we solve the following equations

= R0 • (L2
K1 L0 ),
= L2 • (R2
K2 R0 ),
= R64 • (L66
K1 L64 ),
= L66 • (R66
K2 R64 ),

where P = (L0 , R0 ), P = (L2 , R2 ), C = (L64 , R64 ) and C = (L66 , R66 ). If the two values of K1 match
and the two values of K2 match then we have found the correct key. Once again, this attack will work
regardless of the number of rounds and may still be effective if the cipher function was slightly more
complex (e.g., if the output of the cipher function was subjected to a bitwise rotation, which would
render the divide-and-conquer attack ineffective).
A similar attack can be attempted on a version of TEA with ¬xed subkey constants δi (e.g., δi = δ,
for all i); this version of TEA would have two distinct subkeys (like STEA). The attack on this TEA
variant was described by Roger Fleming [3]. It has a complexity of O(298 ) and requires around 232
known plaintexts. Here the complexity of attack is better than that of exhaustive key-search (which
was not the case with STEA). An improvement of the attack, which reduces the complexity to O(264 )
at the cost of requiring an additional 232 chosen plaintexts, was outlined by David Wagner.

10 Differential Cryptanalysis
In section 2 it was shown that the dif¬culty of solving nonlinear equations can be useful for designing


. 4
( 6 .)