. 17
( 137 .)


Build models.

Figure 3.5 Data mining is not a linear process.
56 Chapter 3

Step One: Translate the Business Problem
into a Data Mining Problem
A favorite scene from Alice in Wonderland is the passage where Alice asks the
Cheshire cat for directions:
“Would you tell me, please, which way I ought to go from here?”

“That depends a good deal on where you want to get to,” said the Cat.

“I don™t much care where”” said Alice.

“Then it doesn™t matter which way you go,” said the Cat.

“”so long as I get somewhere,” Alice added as an explanation.

“Oh, you™re sure to do that,” said the Cat, “if you only walk long enough.”

The Cheshire cat might have added that without some way of recognizing
the destination, you can never tell whether you have walked long enough! The
proper destination for a data mining project is the solution of a well-defined
business problem. Data mining goals for a particular project should not be
stated in broad, general terms, such as:
Gaining insight into customer behavior


Discovering meaningful patterns in data


Learning something interesting


These are all worthy goals, but even when they have been achieved, they are
hard to measure. Projects that are hard to measure are hard to put a value on.
Wherever possible, the broad, general goals should be broken down into more
specific ones to make it easier to monitor progress in achieving them. Gaining
insight into customer behavior might turn into concrete goals:
Identify customers who are unlikely to renew their subscriptions.

Design a calling plan that will reduce churn for home-based business

Rank order all customers based on propensity to ski.

List products whose sales are at risk if we discontinue wine and beer

Not only are these concrete goals easier to monitor, they are easier to trans­
late into data mining problems as well.

What Does a Data Mining Problem Look Like?
To translate a business problem into a data mining problem, it should be refor­
mulated as one of the six data mining tasks introduced in Chapter One:
Data Mining Methodology and Best Practices 57




Affinity Grouping


Description and Profiling

These are the tasks that can be accomplished with the data mining tech­
niques described in this book (though no single data mining tool or technique
is equally applicable to all tasks).
The first three tasks, classification, estimation, and prediction are examples
of directed data mining. Affinity grouping and clustering are examples of undi­
rected data mining. Profiling may be either directed or undirected. In directed
data mining there is always a target variable”something to be classified, esti­
mated, or predicted. The process of building a classifier starts with a prede­
fined set of classes and examples of records that have already been correctly
classified. Similarly, the process of building an estimator starts with historical
data where the values of the target variable are already known. The modeling
task is to find rules that explain the known values of the target variable.
In undirected data mining, there is no target variable. The data mining task is
to find overall patterns that are not tied to any one variable. The most common
form of undirected data mining is clustering, which finds groups of similar
records without any instructions about which variables should be considered as
most important. Undirected data mining is descriptive by nature, so undirected
data mining techniques are often used for profiling, but directed techniques
such as decision trees are also very useful for building profiles. In the machine
learning literature, directed data mining is called supervised learning and undi­
rected data mining is called unsupervised learning.

How Will the Results Be Used?
This is one of the most important questions to ask when deciding how best to
translate a business problem into a data mining problem. Surprisingly often,
the initial answer is “we™re not sure.” An answer is important because, as the
cautionary tale in the sidebar illustrates, different intended uses dictate differ­
ent solutions.
For example, many of our data mining engagements are designed to
improve customer retention. The results of such a study could be used in any
of the following ways:
Proactively contact high risk/high value customers with an offer that

rewards them for staying.
58 Chapter 3

Change the mix of acquisition channels to favor those that bring in the

most loyal customers.
Forecast customer population in future months.

Alter the product to address defects that are causing customers to

Each of these goals has implications for the data mining process. Contacting
existing customers through an outbound telemarketing or direct mail cam­
paign implies that in addition to identifying customers at risk, there is an
understanding of why they are at risk so an attractive offer can be constructed,
and when they are at risk so the call is not made too early or too late. Forecast­
ing implies that in addition to identifying which current customers are likely
to leave, it is possible to determine how many new customers will be added
and how long they are likely to stay. This latter problem of forecasting new
customer starts is typically embedded in business goals and budgets, and is
not usually a predictive modeling problem.

How Will the Results Be Delivered?
A data mining project may result in several very different types of deliver­
ables. When the primary goal of the project is to gain insight, the deliverable is
often a report or presentation filled with charts and graphs. When the project
is a one-time proof-of-concept or pilot project, the deliverable may consist of
lists of customers who will receive different treatments in a marketing experi­
ment. When the data mining project is part of an ongoing analytic customer
relationship management effort, the deliverable is likely to be a computer pro­
gram or set of programs that can be run on a regular basis to score a defined
subset of the customer population along with additional software to manage
models and scores over time. The form of the deliverable can affect the data
mining results. Producing a list of customers for a marketing test is not suffi­
cient if the goal is to dazzle marketing managers.

The Role of Business Users and Information Technology
As described in Chapter 2, the only way to get good answers to the questions
posed above is to involve the owners of the business problem in figuring out
how data mining results will be used and IT staff and database administrators
in figuring out how the results should be delivered. It is often useful to get
input from a broad spectrum within the organization and, where appropriate,
outside it as well. We suggest getting representatives from the various con­
stituencies within the enterprise together in one place, rather than interview­
ing them separately. That way, people with different areas of knowledge and
expertise have a chance to react to each other™s ideas. The goal of all this con­
sultation is a clear statement of the business problem to be addressed. The final
Data Mining Methodology and Best Practices 59


Data Miners, the consultancy started by the authors, was once called upon to
analyze supermarket loyalty card data on behalf of a large consumer packaged
goods manufacturer. To put this story in context, it helps to know a little bit
about the supermarket business. In general, a supermarket does not care
whether a customer buys Coke or Pepsi (unless one brand happens to be on a
special deal that temporarily gives it a better margin), so long as the customer
purchases soft drinks. Product manufacturers, who care very much which
brands are sold, vie for the opportunity to manage whole categories in the
stores. As category managers, they have some control over how their own
products and those of their competitors are merchandised. Our client wanted to
demonstrate its ability to utilize loyalty card data to improve category
management. The category picked for the demonstration was yogurt because
by supermarket standards, yogurt is a fairly high-margin product.
As we understood it, the business goal was to identify yogurt lovers. To
create a target variable, we divided loyalty card customers into groups of high,
medium, and low yogurt affinity based on their total yogurt purchases over
the course of a year and into groups of high, medium, and low users based
on the proportion of their shopping dollars spent on yogurt. People who
were in the high category by both measures were labeled as yogurt lovers.
The transaction data had to undergo many transformations to be turned into
a customer signature. Input variables included the proportion of trips and of
dollars spent at various times of day and in various categories, shopping
frequency, average order size, and other behavioral variables.
Using this data, we built a model that gave all customers a yogurt lover score.
Armed with such a score, it would be possible to print coupons for yogurt when
likely yogurt lovers checked out, even if they did not purchase any yogurt on
that trip. The model might even identify good prospects who had not yet gotten
in touch with their inner yogurt lover, but might if prompted with a coupon.
The model got good lift, and we were pleased with it. The client, however,
was disappointed. “But, who is the yogurt lover?” asked the client. “Someone
who gets a high score from the yogurt lover model” was not considered a good
answer. The client was looking for something like “The yogurt lover is a woman
between the ages of X and Y living in a zip code where the median home price
is between M and N.” A description like that could be used for deciding where
to buy advertising and how to shape the creative content of ads. Ours, based
on shopping behavior rather than demographics, could not.

statement of the business problem should be as specific as possible. “Identify
the 10,000 gold-level customers most likely to defect within the next 60 days”
is better than “provide a churn score for all customers.”
The role of the data miner in these discussions is to ensure that the final
statement of the business problem is one that can be translated into a data min­
ing problem. Otherwise, the best data mining efforts in the world may be
addressing the wrong business problem.
60 Chapter 3

Data mining is often presented as a technical problem of finding a model
that explains the relationship of a target variable to a group of input variables.
That technical task is indeed central to most data mining efforts, but it should
not be attempted until the target variable has been properly defined and the
appropriate input variables identified. That, in turn, depends on a good
understanding of the business problem to be addressed. As the story in the
sidebar illustrates, failure to properly translate the business problem into a
data mining problem leads to one of the dangers we are trying to avoid”
learning things that are true, but not useful.
For a complete treatment of turning business problems into data mining
problems, we recommend the book Business Modeling and Data Mining by our
colleague Dorian Pyle. This book gives detailed advice on how to find the
business problems where data mining provides the most benefit and how to
formulate those problems for mining. Here, we simply remind the reader to
consider two important questions before beginning the actual data mining
process: How will the results be used? And, in what form will the results be
delivered? The answer to the first question goes a long way towards answer­
ing the second.

Step Two: Select Appropriate Data
Data mining requires data. In the best of all possible worlds, the required data
would already be resident in a corporate data warehouse, cleansed, available,
historically accurate, and frequently updated. In fact, it is more often scattered
in a variety of operational systems in incompatible formats on computers run­
ning different operating systems, accessed through incompatible desktop
The data sources that are useful and available vary, of course, from problem
to problem and industry to industry. Some examples of useful data:
Warranty claims data (including both fixed-format and free-text fields)

Point-of-sale data (including ring codes, coupons proffered, discounts

Credit card charge records


. 17
( 137 .)