. 85
( 137 .)


to working with data in bits. However, some patterns of bits may violate con­
straints imposed on the problem. When the genome violates such constraints,
then the fitness is set to a minimum. That is, testing for constraints in the fit­
ness function incorporates constraints into the solution.
For instance, the previous example had a constraint that the value be
between 0 and 31. This was made implicitly true by using 5 bits to represent

Genetic Algorithms 433

the genome. What if there were 8 bits? In this case, the fitness function would
look like:
31 “ p2 when 0 <= p <= 31

0 otherwise

The general rule here is to include a minimum fitness value for any bit pat­
terns that do not make sense or that violate problem constraints. Such patterns
may not be in the original population, but they might appear because of
crossover and mutation.

T I P The fitness function is defined on the genome, a sequence of bits. It must
be able to understand any pattern of 1s and 0s in the bits. When a particular
pattern of bits does not make sense, then the fitness function should return a
very low value”so the pattern does not get passed on to succeeding

Case Study: Using Genetic Algorithms
for Resource Optimization
One area where genetic algorithms have proven quite successful is in prob­
lems involving scheduling resources subject to a wide range of constraints.
These types of problems involve competition for limited resources, while
adhering to a complex set of rules that describe relationships. The key to these
problems is defining a fitness function that incorporates all the constraints into
a single fitness value. These problems are outside the range of what we have
been considering as traditional data mining problems; however, they are inter­
esting and illustrate the power of genetic algorithms.
An example of such a problem is the assignment of 40 medical residents to
various duties in an outpatient clinic, as faced by Dr. Ed Ewen at the Medical
Center of Delaware. The clinic is open 7 days a week, and the residents are
assigned to one particular day of the week through an entire year, regardless
of their other duties. The best assignment balances several different goals:
The clinic must have staff at all times.

The clinic should have a balance of first-, second-, and third-year

Third-year residents see eight patients per day, second-year residents

see six, and first-year residents see four.
434 Chapter 13

So far, this problem is not so complicated. However, each resident spends 4
weeks on a rotation in a given part of the hospital, such as the intensive care
ward, the oncology unit, or a community hospital. These rotations impose
some other constraints:
Senior residents do not go to the clinic when they are assigned to the

medical intensive care rotation, but all other residents do.
Junior residents do not go to the clinic when they are assigned to the

cardiac care rotation, but all other residents do.
No more than two residents from the intensive care rotation can be

assigned to the clinic on the same day.
No more than three residents from other rotations can be assigned to

the clinic on the same day.
As an example of problems that may arise, consider that during one rota­
tion, five residents are assigned to the clinic on a particular day. During the
next rotation, the senior is on the medical intensive care rotation and the two
juniors are on the cardiac care rotation. Now there are only two residents left
at the clinic”and this is insufficient for clinic operations.
The genetic algorithms approach recognizes that there is probably no per­
fect solution to this problem, but that some assignments of residents to days
of the week are clearly better than others. Dr. Ewen recognized that he could
capture the “goodness” of a schedule using a fitness function. Actually, the
function that Dr. Ewen used was an anti-fitness function”the higher the
value, the worse the schedule. This function imposed penalties for violating
the constraints:
For each day when the clinic has fewer than three residents, an amount

is added”a larger amount the bigger the size of the deficit.
For each day when there are no seniors in the clinic, a small amount is

For each day when fewer than three residents are left on a rotation, a

large amount is added to the fitness function.
And so on.

Setting up a spreadsheet with these functions, Dr. Ewen tried to minimize
the functions to get the best assignment. His initial assignments had scores in
the range of 130 to 140. After several hours of work, he was able to reduce the
score to 72. Pretty good.
However, he had available a genetic algorithms package from the Ward
Systems Group (www.wardsystems.com) that plugs into Excel spreadsheets.
He started with a population of 100 randomly generated assignments, none of
Genetic Algorithms 435

which were very good. After 80 generations, the package lowered the score to
21”considerably better than he was able to do by hand.
This example gives a good feeling for optimization problems where genetic
algorithms are applicable. They differ from most data mining problems
because they are more rule-oriented than data-oriented. The key to solving
these problems is to incorporate the constraints into a single fitness function to
be optimized (either by finding a maximum or a minimum). The resulting fit­
ness function might be highly nonlinear, making it difficult to optimize using
other techniques. As we will see, the same techniques adapt to situations fea­
turing larger amounts of data.

T I P Genetic algorithms are a good tool when there are more rules than data
in the problem (although they are useful in other areas as well). These types of
scheduling problems often involve competition for limited resources subject to
complex relationships that describe resources and their users.

Schemata: Why Genetic Algorithms Work
At first sight, there is nothing sacrosanct in the selection, crossover, and muta­
tion operators introduced earlier in this chapter. Why, for instance, does
crossover choose only one intermediate point instead of two or more? Why do
low mutation rates produce better results? The fact that nature behaves this
way is not sufficient justification if multiple crossover points would produce
better results more quickly or if a high mutation rate would work better.
For solving problems that yield actionable results, the fact that genetic algo­
rithms have worked well in practice may be sufficient justification for contin­
uing to use them as they are. However, it is still comforting to know that the
technique has a theoretical foundation. Prof. Holland developed his theory of
schemata processing in the early 1970s to explain why selection, crossover, and
mutation work so well in practice. Readers interested in using genetic algo­
rithms for some of their problems are particularly well advised to understand
schemata, even if the genetic algorithms are buried inside tools they are using,
since this understanding explains both the power and the limits of the
A schema, which comes from the Greek word meaning “form” or “figure,” is
simply a representation of the patterns present in a genome. Schemata (the
plural is formed from the Greek root) are represented as sequences of symbols.
The 1s and 0s (called the fixed positions) of genomes are augmented by an aster­
isk, *, that matches either a 0 or a 1. The relationship between a schema and a
genome is simple. A genome matches a schema when the fixed positions in the
436 Chapter 13

schema match the corresponding positions in the genome. An example should
make this quite clear; the following schema:
matches all of the following four genomes because they all have four symbols,
beginning with a 1 followed by a 0:
1 0 0 0

1 0 0 1

1 0 1 1

1 0 1 0

The order of a schema is the number of fixed positions that it contains. For
instance, the order of 1*10111 is 6, of ***1010**1 is 5, and of 0************** is 1.
The defining length of a schema is the distance between the outermost fixed
positions. So, the defining length of 1*10111 is 6 (counting from the left, 7 “ 1),
of ***1010**1 is 6 (10 “ 4) and of 0************** is 0 (1 “ 1).
Now, let us look at fitness functions in terms of schemata. If the genome 000
survives from one generation to the next, then the schema 0** has also sur­
vived, as have *0*, **0, *00, 0*0, 00*, and ***. The fitness of a particular schema,
then, is the average fitness of all the genomes that match the schema in a given
population. For instance, the fitness of the schema 0** is the average fitness of
the genomes 000, 001, 010, and 011 since the schema survives when these
genomes survive, at least considering only the selection operator. Consider
two schemata from the previous example using the fitness function 31p “ p2,
10*** and 00***. One genome in the initial population matches 10***, so its fit­
ness is 176. The two genomes matching 00*** have fitness values of 87 and 58.
The first schema is fitter than the second. And, in fact, in the next generation
there is only one genome matching 00*** and there are two matching 10***. The
fitter schema has survived and proliferated; the less fit is disappearing.
A geometric view of schemata is sometimes helpful for understanding
them. Consider the eight possible genomes of length three: 000, 001, 010, 011,
100, 101, 110, and 111. These lie at the corners of a unit cube, as shown in
Figure 13.3. Schemata then correspond to the edges and faces of the cube. The
edges are the schemata of order 2 and the faces of order 1. As genetic algo­
rithms are processing different genomes; they are also processing schemata,
visualized by these features on a cube. The population covers pieces of the
cube trying to find the corners with the best fitness, and the schemata provide
information about large regions of the possible solutions. This geometric per­
spective generalizes to higher dimensions, where the selection, crossover, and
mutation operators correspond to cuts through hypercubes in some higher-
dimension space that is a bit harder to visualize
Genetic Algorithms 437

011 111
01* 11*
010 *10

Schema for
this face is
Schema for
0*0 1*0 this face is


Schema for
this edge
Figure 13.3 A cube is a useful representation of schemata on 3 bits. The corners represent
the genomes, the edges represent the schemata of order 2, the faces, the schemata of order
1, and the entire cube, the schema of order 0.

Consider the schema, 1***1. This is also quite fit in the original population,
with a fitness of 150. There is one genome that matches it in the original popu­
lation and the same one in the next generation. This schema has survived only
because the genome containing it did not cross over with another genome. A
crossover would likely have destroyed it. Compare this to 10*** that survived
a crossover. The shorter the defining length of a schema, the more likely it will
be to survive from one generation to another. So, even longer schemata that are
very fit are likely to be replaced by shorter, but fit, cousins. Using more com­
plicated crossover techniques, such as making two cuts, changes the behavior
entirely. With more complicated techniques, the defining length is no longer
useful, and Holland™s results on schemata do not necessarily hold.
Holland rigorously proved these two observations and summed them up in
the Schema Theorem (also called the Fundamental Theorem of Genetic Algo­
rithms): short, low-order schemata with above-average fitness increase in pop­
ulation from one generation to the next. In other words, short, low-order
schemata are the building blocks that genetic algorithms are working on. From
one generation to the next, the fittest building blocks survive and mix with
each other to produce fitter and fitter genomes.
438 Chapter 13

The Schema Theorem explains that genetic algorithms are really searching
through the possible schemata to find fit building blocks that survive from one
generation to the next. A natural question is how many building blocks are typ­
ically being processed? We will spare the reader the details, but Holland showed
that the number of schemata being processed by a population of n genomes is
proportional to n3. This means that each generation is really evaluating n3 dif­
ferent schemata, even though it is only doing work on n different genomes.


. 85
( 137 .)