. 14
( 137 .)


worked with Joseph Harder of the College of Business and Administration at
Southern Illinois on this project.)
Traditional expert systems consist of a large database of hundreds or thou­
sands of rules collected by observing and interviewing human experts who are
skilled at a particular task. Expert systems have enjoyed some success in cer­
tain domains such as medical diagnosis and answering tax questions, but the
difficulty of collecting the rules has limited their use.
The team at Southern Illinois decided to solve these problems by generating
the rules directly from historical data. In other words, they would replace
expert interviews with data mining.

The Initial Challenge
The initial challenge that Detroit brought to Carbondale was to improve
response to a direct mail campaign for a particular model. The campaign
involved sending an invitation to a prospect to come test-drive the new model.
Anyone accepting the invitation would find a free pair of sunglasses waiting
at the dealership. The problem was that very few people were returning the
response card or calling the toll-free number for more information, and few of
those that did ended up buying the vehicle. The company knew it could save
itself a lot of money by not sending the offer to people unlikely to respond, but
it didn™t know who those were.
40 Chapter 2

How Data Mining Was Applied
As is often the case when the data to be mined is from several different sources,
the first challenge was to integrate data so that it could tell a consistent story.

The Data
The first file, the “mail file,” was a mailing list containing names and addresses
of about a million people who had been sent the promotional mailing. This file
contained very little information likely to be useful for selection.
The mail file was appended with data based on zip codes from the commer­
cially available PRIZM database. This database contains demographic and
“psychographic” characterizations of the neighborhoods associated with the
zip codes.
Two additional files contained information on people who had sent back the
response card or called the toll-free number for more information. Linking the
response cards back to the original mailing file was simple because the mail
file contained a nine-character key for each address that was printed on the
response cards. Telephone responders presented more of a problem since their
reported name and address might not exactly match their address in the data­
base, and there is no guarantee that the call even came from someone on the
mailing list since the recipient may have passed the offer on to someone else.
Of 1,000,003 people who were sent the mailing, 32,904 responded by send­
ing back a card and 16,453 responded by calling the toll-free number for a total
initial response rate of 5 percent. The auto maker™s primary interest, of course,
was in the much smaller number of people who both responded to the mailing
and bought the advertised car. These were to be found in a sales file, obtained
from the manufacturer, that contained the names, addresses, and model pur­
chased for all car buyers in the 3-month period following the mailing.
An automated name-matching program with loosely set matching stan­
dards discovered around 22,000 apparent matches between people who
bought cars and people who had received the mailing. Hand editing reduced
the intersection to 4,764 people who had received the mailing and bought a
car. About half of those had purchased the advertised model. See Figure 2.5 for
a comparison of all these data sources.

Down the Mine Shaft
The experimental design called for the population to be divided into exactly
two classes”success and failure. This is certainly a questionable design since it
obscures interesting differences. Surely, people who come into the dealership to
test-drive one model, but end up buying another should be in a different class
than nonresponders, or people who respond, but buy nothing. For that matter,
people who weren™t considered good enough prospects to be sent a mailing,
but who nevertheless bought the car are an even more interesting group.
The Virtuous Cycle of Data Mining 41

Sales (270,172)
Resp Cards

Mass Mailing

Resp Calls

Figure 2.5 Prospects in the training set have overlapping relationships.

Be that as it may, success was defined as “received a mailing and bought the
car” and failure was defined as “received the mailing, but did not buy the car.”
A series of trials was run using decision trees and neural networks. The tools
were tested on various kinds of training sets. Some of the training sets
reflected the true proportion of successes in the database, while others were
enriched to have up to 10 percent successes”and higher concentrations might
have produced better results.
The neural network did better on the sparse training sets, while the decision
tree tool appeared to do better on the enriched sets. The researchers decided on
a two-stage process. First, a neural network determined who was likely to buy
a car, any car, from the company. Then, the decision tree was used to predict
which of the likely car buyers would choose the advertised model. This two-
step process proved quite successful. The hybrid data mining model combin­
ing decision trees and neural networks missed very few buyers of the targeted
model while at the same time screening out many more nonbuyers than either
the neural net or the decision tree was able to do.

The Resulting Actions
Armed with a model that could effectively reach responders the company
decided to take the money saved by mailing fewer pieces and put it into
improving the lure offered to get likely buyers into the showroom. Instead of
sunglasses for the masses, they offered a nice pair of leather boots to the far
42 Chapter 2

smaller group of likely buyers. The new approach proved much more effective
than the first.

Completing the Cycle
The university-based data mining project showed that even with only a lim­
ited number of broad-brush variables to work with and fairly primitive data
mining tools, data mining could improve the effectiveness of a direct market­
ing campaign for a big-ticket item like an automobile. The next step is to gather
more data, build better models, and try again!

Lessons Learned

This chapter started by recalling the drivers of the industrial revolution and
the creation of large mills in England and New England. These mills are now
abandoned, torn down, or converted to other uses. Water is no longer the driv­
ing force of business. It has been replaced by data.
The virtuous cycle of data mining is about harnessing the power of data and
transforming it into actionable business results. Just as water once turned the

wheels that drove machines throughout a mill, data needs to be gathered and
disseminated throughout an organization to provide value. If data is water in
this analogy, then data mining is the wheel, and the virtuous cycle spreads the
power of the data to all the business processes.
The virtuous cycle of data mining is a learning process based on customer
data. It starts by identifying the right business opportunities for data mining.
The best business opportunities are those that will be acted upon. Without
action, there is little or no value to be gained from learning about customers.
Also very important is measuring the results of the action. This com­
pletes the loop of the virtuous cycle, and often suggests further data mining


Data Mining Methodology
and Best Practices

The preceding chapter introduced the virtuous cycle of data mining as a busi­
ness process. That discussion divided the data mining process into four stages:
1. Identifying the problem
2. Transforming data into information
3. Taking action
4. Measuring the outcome
Now it is time to start looking at data mining as a technical process. The
high-level outline remains the same, but the emphasis shifts. Instead of identi­
fying a business problem, we now turn our attention to translating business
problems into data mining problems. The topic of transforming data into
information is expanded into several topics including hypothesis testing, pro­
filing, and predictive modeling. In this chapter, taking action refers to techni­
cal actions such as model deployment and scoring. Measurement refers to the
testing that must be done to assess a model™s stability and effectiveness before
it is used to guide marketing actions.
Because the entire book is based on this methodology, the best practices
introduced here are elaborated upon elsewhere. The purpose of this chapter is
to bring them together in one place and to organize them into a methodology.
The best way to avoid breaking the virtuous cycle of data mining is to
understand the ways it is likely to fail and take preventative steps. Over the

44 Chapter 3

years, the authors have encountered many ways for data mining projects to go
wrong. In response, we have developed a useful collection of habits”things
we do to smooth the path from the initial statement of a business problem to a
stable model that produces actionable and measurable results. This chapter
presents this collection of best practices as the orderly steps of a data mining
methodology. Don™t be fooled”data mining is a naturally iterative process.
Some steps need to be repeated several times, but none should be skipped
The need for a rigorous approach to data mining increases with the com­
plexity of the data mining approach. After establishing the need for a method­
ology by describing various ways that data mining efforts can fail in the
absence of one, the chapter starts with the simplest approach to data mining”
using ad hoc queries to test hypotheses”and works up to more sophisticated
activities such as building formal profiles that can be used as scoring models
and building true predictive models. Finally, the four steps of the virtuous
cycle are translated into an 11-step data mining methodology.

Why Have a Methodology?
Data mining is a way of learning from the past so as to make better decisions
in the future. The best practices described in this chapter are designed to avoid
two undesirable outcomes of the learning process:
Learning things that aren™t true

Learning things that are true, but not useful

These pitfalls are like the rocks of Scylla and the whirlpool of Charybdis that
protect the narrow straits between Sicily and the Italian mainland. Like the
ancient sailors who learned to avoid these threats, data miners need to know
how to avoid common dangers.

Learning Things That Aren™t True
Learning things that aren™t true is more dangerous than learning things that
are useless because important business decisions may be made based on incor­
rect information. Data mining results often seem reliable because they are
based on actual data in a seemingly scientific manner. This appearance of reli­
ability can be deceiving. The data itself may be incorrect or not relevant to the
question at hand. The patterns discovered may reflect past business decisions
or nothing at all. Data transformations such as summarization may have
destroyed or hidden important information. The following sections discuss
some of the more common problems that can lead to false conclusions.
Data Mining Methodology and Best Practices 45

Patterns May Not Represent Any Underlying Rule
It is often said that figures don™t lie, but liars can figure. When it comes to find­
ing patterns in data, figures don™t have to actually lie in order to suggest things
that aren™t true. There are so many ways to construct patterns that any random
set of data points will reveal one if examined long enough. Human beings
depend so heavily on patterns in our lives that we tend to see them even when
they are not there. We look up at the nighttime sky and see not a random
arrangement of stars, but the Big Dipper, or, the Southern Cross, or Orion™s
Belt. Some even see astrological patterns and portents that can be used to pre­
dict the future. The widespread acceptance of outlandish conspiracy theories
is further evidence of the human need to find patterns.
Presumably, the reason that humans have evolved such an affinity for pat­
terns is that patterns often do reflect some underlying truth about the way the
world works. The phases of the moon, the progression of the seasons, the con­
stant alternation of night and day, even the regular appearance of a favorite TV
show at the same time on the same day of the week are useful because they are
stable and therefore predictive. We can use these patterns to decide when it is
safe to plant tomatoes and how to program the VCR. Other patterns clearly do
not have any predictive power. If a fair coin comes up heads five times in a
row, there is still a 50-50 chance that it will come up tails on the sixth toss.
The challenge for data miners is to figure out which patterns are predictive
and which are not. Consider the following patterns, all of which have been
cited in articles in the popular press as if they had predictive value:
The party that does not hold the presidency picks up seats in Congress

during off-year elections.
When the American League wins the World Series, Republicans take


. 14
( 137 .)