If r = 2 , Eq. (7.6) can be written differently with the following notation:

PI= p, Pz = 1 - p. Then

The graph of the entropy for this case is shown in Fig. 7.1.

t:"'

Fig. 7.1 The graph of binary entropy.

Theoretical Security of Cryptosystems 121

Consider the simplest properties of entropy.

Proposition 7.1

Proof. The first property is quite obvious (see (7.6)). We prove the

second property only for r = 2 since the general case is similar. Explore

the graph of entropy. We need to find the maximum of function (7.7). For

that, we find the first and second derivatives of H ( p ) assuming for simplicity

that the logarithms are natural.

Hence H™(p) = 0 if p = 1/2. Find the second derivative

We can see that H”(p) < 0 when p E (0; 1). It means that the function

H ( p ) attains the maximum when p = l / 2 and is convex in the interval

(0; 1). So the graph depicted in Fig. 7.1 is justified. It is plain that under

any other base of the logarithm the graph will be analogous.

The third property follows directly from the definition of entropy (7.6):

The physical sense of the entropy is the quantitative measure of uncer-

tainty. Consider, for instance, three different random variables (1, J2, J3

taking on values a1 and a2 (r = 2):

1 , P(a2) = 0 ;

( : P(u1) =

1

Jz : P ( u ˜ ) 0.5, P(a2) = 0.5 ;

=

J3 : P(u1) = 0.01, P(a2) = 0.99.

Intuition suggests that uncertainty of is zero. Actually,

J1

H ( & ) = - ( l . l o g l + 0 . logo) = 0

122 Basics of Contemporary Cryptography for IT Practitioners

Look at J2 and J3. Intuition says that uncertainty of is higher than that

<2

of &. Compute the entropies:

1 bit

H(J2) =

(has been computed above)

H (63) = - (0.01 . log 0.01 + 0.99 . log 0.99) M 0.08 bit.

We can see that the entropy is indeed a reasonable measure of uncer-

tainty. But the matter is not the examples of that sort. The entropy plays a

key role in many problems of telecommunications and information sciences.

In particular, the entropy characterises the maximum attainable degree of

data compression. More exactly, if a source with the (defined below) limit

entropy h generates a (sufficiently long) message of n symbols, this message

can be compressed up to nh bits on average. For example, if h = 1/2, the

message can be compressed to half of its size. Point out that we speak of

so-called lossless compression when the source message can be recovered

without distortion.

Consider now a two-dimensional random variable defined by the follow-

ing distribution

Introduce the following notation:

c Pij ,

S

Pi. = P(J, = ai) =

j=1

T

C Pij .

P.j = P(J2 = b j ) =

i=l

By analogy with (7.6) define the entropy of two-dimensional random

variable

T S

Pij log Paj .

H(J1,J 2 ) =- (7.9)

i=l j=1

Similarly, for a three-dimensional random variable and probabil-

(J1,52,&)

ity distribution P i j k , define

(7.10)

i j k

The entropy of n-dimensional random variable is defined in the same way.

Theoretical Security of Cryptosystems 123

Suppose now that the value of 51 is known but the value of 52 is not.

Then it is natural to define the conditional entropy

(7.11)

i=l j=1

&

This is the mean conditional entropy of random variable under the con-

dition that the value of & is known.

Proposition 7.2 (properties of two-dimensional entropy)

+ H (52151) , (7.12)

H(51,52) = H ((1)

in particular, for independent random variables & and &,

=H

H(52151) (52) 7

+H . (7.13)

H(51,<2) = H (t) (E2)

Recall from probability theory that &, 5 2 are independent if Pij = PiPj

for all i and j. The proof of the proposition is quite simple and can be

found in the literature. We confine ourselves only with its interpretation.

In the first experiment, let 51 be observed, in the second experiment 5 2 .

Then the total uncertainty must be equal to the uncertainty of the first

experiment summed up with the conditional uncertainty of the second. In

case of independent (1 and 6, the knowledge of one variable does not offer

any information about the value of the other, which corresponds to (7.13).

cn)

Let an n-dimensional random variable (&, C2, . . . , be given. Then

the following equation holds:

m>= H(51) + I1) + H(63 I51,t 2 ) + . . . + H(5nIt1,. . . ,tn- 1) .

H(I1,. . . t

H(t2

7

(7.14)

For independent random variables,

n

. . ,rn) = E m) (7.15)

H(E1,.

i=l

(observe that (7.12), (7.13) are special cases of (7.14), (7.15)).

In general case,

Consider a sequence of random variables 51 ,&,&, . . . (titake on values

in A ) , which may be interpreted as a random process with discrete time.

We shall assume that this process is stationary, i.e. informally, probabilities

Basics of Contemporary Cryptography for I T Practitioners

124

for (Ji, . ,J n ) are the same as for ( J A + ˜ ,...

.. for all positive integers

n and A.

Let H(J1,. . . ,J n ) be the entropy of n-dimensional random variable

( ( I , .. . ,En). Denote by

the entropy rate of the nth order and define

. . ,En-1)

hi =H (JnlE1,.

Note the following properties:

n > 1,

i hi-,, (7.17)

h:

(7.18)

h, 5 h,+,

n > 1. (7.19)

h i 5 hi-,,

& , . . . ,tnthe equalities

For independent el,

hi i =h

=h

hold. (The process generating independent random variables is said to be

the memoryless process.)

Theorem 7 3 For a stationary process, the limits limn+00 h$ and

.

limn-, h; exist and are equal.

Denote the common value of these limits by ha,

h, = lim h: = lim h i . (7.20)

n-00 71-00

u2,.. . a,) be given. We know that

Let the alphabet A = (ul,

. (<I)

max H = log r

for memoryless processes, therefore, taking into account (7.19) and (7.20),

we obtain max h, = logr, the maximum being reached for the memoryless

processes with equal probabilities of letters 1/r. It is natural to introduce

the quantity

R = logr - h, (7.21)

called redundancy (per source letter). Informally, this is, so to speak, “un-

used” portion of the alphabet. Redundancy is a quantitative measure of

mutual dependence and non-uniformity of the letters. Note that in the

125

Theoretical Security of Cryptosystems

second example with the Caesar cipher (p. 5), the redundancy of encrypted

message equals zero since all letters (decimal digits) are equiprobable and

independent, i.e. h, = log10 and R = 0. And this caused the simple

Caesar cipher to become unbreakable.

Unicity Distance for Secret Key Cipher

7.5

Consider the secret-key cryptosystem shown in Fig. 1.1 (p. 3). Let the

source generate the message iiz = m1m2.. . m,. Alice and Bob have the

secret key k, known only to them, and let E = e1e2.. . en be the ciphertext

produced under that key.

Example 7 3 Let the memoryless source generate letters over the alpha-

.

bet A = { a , b, c} with probabilities P ( u ) = 0.8, P(b) = 0.15, P(c) = 0.05.

Let the encrypter substitutes the letters in the message by the other letters

according to some fixed permutation depending on the key k:

( a ,b, C ) k = 1 ,

( a ,c, b ) k = 2 ,

(b, a , c) k = 3 ,

( b , c, a ) k = 4,

(c,a, b) k = 5 ,

(c,b, u ) k = 6 .

That is, there are 6 possible keys (from 1 to 6) and if e.g. k = 5 , the

following substitution is made in the message: a 4 c, b -+a, c -+ b.

Let Eve has eavesdropped the ciphertext