straints imposed on the problem. When the genome violates such constraints,

then the fitness is set to a minimum. That is, testing for constraints in the fit

ness function incorporates constraints into the solution.

For instance, the previous example had a constraint that the value be

between 0 and 31. This was made implicitly true by using 5 bits to represent

Team-Fly®

Genetic Algorithms 433

the genome. What if there were 8 bits? In this case, the fitness function would

look like:

31 “ p2 when 0 <= p <= 31

––

0 otherwise

––

The general rule here is to include a minimum fitness value for any bit pat

terns that do not make sense or that violate problem constraints. Such patterns

may not be in the original population, but they might appear because of

crossover and mutation.

T I P The fitness function is defined on the genome, a sequence of bits. It must

be able to understand any pattern of 1s and 0s in the bits. When a particular

pattern of bits does not make sense, then the fitness function should return a

very low value”so the pattern does not get passed on to succeeding

generations.

Case Study: Using Genetic Algorithms

for Resource Optimization

One area where genetic algorithms have proven quite successful is in prob

lems involving scheduling resources subject to a wide range of constraints.

These types of problems involve competition for limited resources, while

adhering to a complex set of rules that describe relationships. The key to these

problems is defining a fitness function that incorporates all the constraints into

a single fitness value. These problems are outside the range of what we have

been considering as traditional data mining problems; however, they are inter

esting and illustrate the power of genetic algorithms.

An example of such a problem is the assignment of 40 medical residents to

various duties in an outpatient clinic, as faced by Dr. Ed Ewen at the Medical

Center of Delaware. The clinic is open 7 days a week, and the residents are

assigned to one particular day of the week through an entire year, regardless

of their other duties. The best assignment balances several different goals:

The clinic must have staff at all times.

––

The clinic should have a balance of first-, second-, and third-year

––

residents.

Third-year residents see eight patients per day, second-year residents

––

see six, and first-year residents see four.

434 Chapter 13

So far, this problem is not so complicated. However, each resident spends 4

weeks on a rotation in a given part of the hospital, such as the intensive care

ward, the oncology unit, or a community hospital. These rotations impose

some other constraints:

Senior residents do not go to the clinic when they are assigned to the

––

medical intensive care rotation, but all other residents do.

Junior residents do not go to the clinic when they are assigned to the

––

cardiac care rotation, but all other residents do.

No more than two residents from the intensive care rotation can be

––

assigned to the clinic on the same day.

No more than three residents from other rotations can be assigned to

––

the clinic on the same day.

As an example of problems that may arise, consider that during one rota

tion, five residents are assigned to the clinic on a particular day. During the

next rotation, the senior is on the medical intensive care rotation and the two

juniors are on the cardiac care rotation. Now there are only two residents left

at the clinic”and this is insufficient for clinic operations.

The genetic algorithms approach recognizes that there is probably no per

fect solution to this problem, but that some assignments of residents to days

of the week are clearly better than others. Dr. Ewen recognized that he could

capture the “goodness” of a schedule using a fitness function. Actually, the

function that Dr. Ewen used was an anti-fitness function”the higher the

value, the worse the schedule. This function imposed penalties for violating

the constraints:

For each day when the clinic has fewer than three residents, an amount

––

is added”a larger amount the bigger the size of the deficit.

For each day when there are no seniors in the clinic, a small amount is

––

added.

For each day when fewer than three residents are left on a rotation, a

––

large amount is added to the fitness function.

And so on.

––

Setting up a spreadsheet with these functions, Dr. Ewen tried to minimize

the functions to get the best assignment. His initial assignments had scores in

the range of 130 to 140. After several hours of work, he was able to reduce the

score to 72. Pretty good.

However, he had available a genetic algorithms package from the Ward

Systems Group (www.wardsystems.com) that plugs into Excel spreadsheets.

He started with a population of 100 randomly generated assignments, none of

Genetic Algorithms 435

which were very good. After 80 generations, the package lowered the score to

21”considerably better than he was able to do by hand.

This example gives a good feeling for optimization problems where genetic

algorithms are applicable. They differ from most data mining problems

because they are more rule-oriented than data-oriented. The key to solving

these problems is to incorporate the constraints into a single fitness function to

be optimized (either by finding a maximum or a minimum). The resulting fit

ness function might be highly nonlinear, making it difficult to optimize using

other techniques. As we will see, the same techniques adapt to situations fea

turing larger amounts of data.

T I P Genetic algorithms are a good tool when there are more rules than data

in the problem (although they are useful in other areas as well). These types of

scheduling problems often involve competition for limited resources subject to

complex relationships that describe resources and their users.

Schemata: Why Genetic Algorithms Work

At first sight, there is nothing sacrosanct in the selection, crossover, and muta

tion operators introduced earlier in this chapter. Why, for instance, does

crossover choose only one intermediate point instead of two or more? Why do

low mutation rates produce better results? The fact that nature behaves this

way is not sufficient justification if multiple crossover points would produce

better results more quickly or if a high mutation rate would work better.

For solving problems that yield actionable results, the fact that genetic algo

rithms have worked well in practice may be sufficient justification for contin

uing to use them as they are. However, it is still comforting to know that the

technique has a theoretical foundation. Prof. Holland developed his theory of

schemata processing in the early 1970s to explain why selection, crossover, and

mutation work so well in practice. Readers interested in using genetic algo

rithms for some of their problems are particularly well advised to understand

schemata, even if the genetic algorithms are buried inside tools they are using,

since this understanding explains both the power and the limits of the

technique.

A schema, which comes from the Greek word meaning “form” or “figure,” is

simply a representation of the patterns present in a genome. Schemata (the

plural is formed from the Greek root) are represented as sequences of symbols.

The 1s and 0s (called the fixed positions) of genomes are augmented by an aster

isk, *, that matches either a 0 or a 1. The relationship between a schema and a

genome is simple. A genome matches a schema when the fixed positions in the

436 Chapter 13

schema match the corresponding positions in the genome. An example should

make this quite clear; the following schema:

10**

matches all of the following four genomes because they all have four symbols,

beginning with a 1 followed by a 0:

1 0 0 0

1 0 0 1

1 0 1 1

1 0 1 0

The order of a schema is the number of fixed positions that it contains. For

instance, the order of 1*10111 is 6, of ***1010**1 is 5, and of 0************** is 1.

The defining length of a schema is the distance between the outermost fixed

positions. So, the defining length of 1*10111 is 6 (counting from the left, 7 “ 1),

of ***1010**1 is 6 (10 “ 4) and of 0************** is 0 (1 “ 1).

Now, let us look at fitness functions in terms of schemata. If the genome 000

survives from one generation to the next, then the schema 0** has also sur

vived, as have *0*, **0, *00, 0*0, 00*, and ***. The fitness of a particular schema,

then, is the average fitness of all the genomes that match the schema in a given

population. For instance, the fitness of the schema 0** is the average fitness of

the genomes 000, 001, 010, and 011 since the schema survives when these

genomes survive, at least considering only the selection operator. Consider

two schemata from the previous example using the fitness function 31p “ p2,

10*** and 00***. One genome in the initial population matches 10***, so its fit

ness is 176. The two genomes matching 00*** have fitness values of 87 and 58.

The first schema is fitter than the second. And, in fact, in the next generation

there is only one genome matching 00*** and there are two matching 10***. The

fitter schema has survived and proliferated; the less fit is disappearing.

A geometric view of schemata is sometimes helpful for understanding

them. Consider the eight possible genomes of length three: 000, 001, 010, 011,

100, 101, 110, and 111. These lie at the corners of a unit cube, as shown in

Figure 13.3. Schemata then correspond to the edges and faces of the cube. The

edges are the schemata of order 2 and the faces of order 1. As genetic algo

rithms are processing different genomes; they are also processing schemata,

visualized by these features on a cube. The population covers pieces of the

cube trying to find the corners with the best fitness, and the schemata provide

information about large regions of the possible solutions. This geometric per

spective generalizes to higher dimensions, where the selection, crossover, and

mutation operators correspond to cuts through hypercubes in some higher-

dimension space that is a bit harder to visualize

Genetic Algorithms 437

011 111

*11

01* 11*

010 *10

110

Schema for

1*1

this face is

**0

Schema for

0*0 1*0 this face is

1**

101

10*

*00

100

000

Schema for

this edge

Figure 13.3 A cube is a useful representation of schemata on 3 bits. The corners represent

the genomes, the edges represent the schemata of order 2, the faces, the schemata of order

1, and the entire cube, the schema of order 0.

Consider the schema, 1***1. This is also quite fit in the original population,

with a fitness of 150. There is one genome that matches it in the original popu

lation and the same one in the next generation. This schema has survived only

because the genome containing it did not cross over with another genome. A

crossover would likely have destroyed it. Compare this to 10*** that survived

a crossover. The shorter the defining length of a schema, the more likely it will

be to survive from one generation to another. So, even longer schemata that are

very fit are likely to be replaced by shorter, but fit, cousins. Using more com

plicated crossover techniques, such as making two cuts, changes the behavior

entirely. With more complicated techniques, the defining length is no longer

useful, and Holland™s results on schemata do not necessarily hold.

Holland rigorously proved these two observations and summed them up in

the Schema Theorem (also called the Fundamental Theorem of Genetic Algo

rithms): short, low-order schemata with above-average fitness increase in pop

ulation from one generation to the next. In other words, short, low-order

schemata are the building blocks that genetic algorithms are working on. From

one generation to the next, the fittest building blocks survive and mix with

each other to produce fitter and fitter genomes.

438 Chapter 13

The Schema Theorem explains that genetic algorithms are really searching

through the possible schemata to find fit building blocks that survive from one

generation to the next. A natural question is how many building blocks are typ

ically being processed? We will spare the reader the details, but Holland showed

that the number of schemata being processed by a population of n genomes is

proportional to n3. This means that each generation is really evaluating n3 dif

ferent schemata, even though it is only doing work on n different genomes.