<<

. 8
( 137 .)



>>

the fundamental patterns beneath seemingly random variations”is an impor­
tant role of data mining.
This book covers all the most important data mining techniques and the
strengths and weaknesses of each in the context of customer relationship
management.


The Role of the Customer Relationship
Management Strategy
To be effective, data mining must occur within a context that allows an organi­
zation to change its behavior as a result of what it learns. It is no use knowing
that wireless telephone customers who are on the wrong rate plan are likely to
cancel their subscriptions if there is no one empowered to propose that they
switch to a more appropriate plan as suggested in the sidebar. Data mining
should be embedded in a corporate customer relationship strategy that spells
out the actions to be taken as a result of what is learned through data mining.
When low-value customers are identified, how will they be treated? Are there
programs in place to stimulate their usage to increase their value? Or does it
make more sense to lower the cost of serving them? If some channels consis­
tently bring in more profitable customers, how can resources be shifted to
those channels?
Data mining is a tool. As with any tool, it is not sufficient to understand how
it works; it is necessary to understand how it will be used.
Why and What Is Data Mining? 7


DATA MINING SUGGESTS, BUSINESSES DECIDE

This sidebar explores the example from the main text in slightly more detail. An
analysis of attrition at a wireless telephone service provider often reveals that
people whose calling patterns do not match their rate plan are more likely to
cancel their subscriptions. People who use more than the number of minutes
included in their plan are charged for the extra minutes”often at a high rate.
People who do not use their full allotment of minutes are paying for minutes
they do not use and are likely to be attracted to a competitor™s offer of a
cheaper plan.
This result suggests doing something proactive to move customers to the
right rate plan. But this is not a simple decision. As long as they don™t quit,
customers on the wrong rate plan are more profitable if left alone. Further
analysis may be needed. Perhaps there is a subset of these customers who are
not price sensitive and can be safely left alone. Perhaps any intervention will
simply hand customers an opportunity to cancel. Perhaps a small “rightsizing”
test can help resolve these issues. Data mining can help make more informed
decisions. It can suggest tests to make. Ultimately, though, the business needs
to make the decision.



What Is Data Mining?

Data mining, as we use the term, is the exploration and analysis of large quan­
tities of data in order to discover meaningful patterns and rules. For the pur­
poses of this book, we assume that the goal of data mining is to allow a
corporation to improve its marketing, sales, and customer support operations
through a better understanding of its customers. Keep in mind, however, that
the data mining techniques and tools described here are equally applicable in
fields ranging from law enforcement to radio astronomy, medicine, and indus­
trial process control.
In fact, hardly any of the data mining algorithms were first invented with
commercial applications in mind. The commercial data miner employs a grab
bag of techniques borrowed from statistics, computer science, and machine
learning research. The choice of a particular combination of techniques to
apply in a particular situation depends on the nature of the data mining task,
the nature of the available data, and the skills and preferences of the data
miner.
Data mining comes in two flavors”directed and undirected. Directed data
mining attempts to explain or categorize some particular target field such as
income or response. Undirected data mining attempts to find patterns or
similarities among groups of records without the use of a particular target field
or collection of predefined classes. Both these flavors are discussed in later
chapters.
8 Chapter 1


Data mining is largely concerned with building models. A model is simply
an algorithm or set of rules that connects a collection of inputs (often in the
form of fields in a corporate database) to a particular target or outcome.
Regression, neural networks, decision trees, and most of the other data mining
techniques discussed in this book are techniques for creating models. Under
the right circumstances, a model can result in insight by providing an
explanation of how outcomes of particular interest, such as placing an order or
failing to pay a bill, are related to and predicted by the available facts. Models
are also used to produce scores. A score is a way of expressing the findings of a
model in a single number. Scores can be used to sort a list of customers from
most to least loyal or most to least likely to respond or most to least likely to
default on a loan.
The data mining process is sometimes referred to as knowledge discovery or
KDD (knowledge discovery in databases). We prefer to think of it as knowledge
creation.


What Tasks Can Be Performed with Data Mining?
Many problems of intellectual, economic, and business interest can be phrased
in terms of the following six tasks:
Classification
––

Estimation
––

Prediction
––

Affinity grouping
––

Clustering
––

Description and profiling
––


The first three are all examples of directed data mining, where the goal is to
find the value of a particular target variable. Affinity grouping and clustering
are undirected tasks where the goal is to uncover structure in data without
respect to a particular target variable. Profiling is a descriptive task that may
be either directed or undirected.


Classification
Classification, one of the most common data mining tasks, seems to be a
human imperative. In order to understand and communicate about the world,
we are constantly classifying, categorizing, and grading. We divide living
things into phyla, species, and general; matter into elements; dogs into breeds;
people into races; steaks and maple syrup into USDA grades.
Why and What Is Data Mining? 9


Classification consists of examining the features of a newly presented object
and assigning it to one of a predefined set of classes. The objects to be classified
are generally represented by records in a database table or a file, and the act of
classification consists of adding a new column with a class code of some kind.
The classification task is characterized by a well-defined definition of the
classes, and a training set consisting of preclassified examples. The task is to
build a model of some kind that can be applied to unclassified data in order to
classify it.
Examples of classification tasks that have been addressed using the tech­
niques described in this book include:
Classifying credit applicants as low, medium, or high risk
––

Choosing content to be displayed on a Web page
––

Determining which phone numbers correspond to fax machines
––

Spotting fraudulent insurance claims
––

Assigning industry codes and job designations on the basis of free-text
––

job descriptions
In all of these examples, there are a limited number of classes, and we expect
to be able to assign any record into one or another of them. Decision trees (dis­
cussed in Chapter 6) and nearest neighbor techniques (discussed in Chapter 8)
are techniques well suited to classification. Neural networks (discussed in
Chapter 7) and link analysis (discussed in Chapter 10) are also useful for clas­
sification in certain circumstances.


Estimation
Classification deals with discrete outcomes: yes or no; measles, rubella, or
chicken pox. Estimation deals with continuously valued outcomes. Given
some input data, estimation comes up with a value for some unknown contin­
uous variable such as income, height, or credit card balance.
In practice, estimation is often used to perform a classification task. A credit
card company wishing to sell advertising space in its billing envelopes to a ski
boot manufacturer might build a classification model that put all of its card­
holders into one of two classes, skier or nonskier. Another approach is to build
a model that assigns each cardholder a “propensity to ski score.” This might
be a value from 0 to 1 indicating the estimated probability that the cardholder
is a skier. The classification task now comes down to establishing a threshold
score. Anyone with a score greater than or equal to the threshold is classed as
a skier, and anyone with a lower score is considered not to be a skier.
The estimation approach has the great advantage that the individual records
can be rank ordered according to the estimate. To see the importance of this,
10 Chapter 1


imagine that the ski boot company has budgeted for a mailing of 500,000
pieces. If the classification approach is used and 1.5 million skiers are identi­
fied, then it might simply place the ad in the bills of 500,000 people selected at
random from that pool. If, on the other hand, each cardholder has a propensity
to ski score, it can send the ad to the 500,000 most likely candidates.
Examples of estimation tasks include:
Estimating the number of children in a family

––

Estimating a family™s total household income

––

Estimating the lifetime value of a customer

––

Estimating the probability that someone will respond to a balance

––

transfer solicitation.
Regression models (discussed in Chapter 5) and neural networks (discussed
in Chapter 7) are well suited to estimation tasks. Survival analysis (Chapter 12)
is well suited to estimation tasks where the goal is to estimate the time to an
event, such as a customer stopping.


Prediction
Prediction is the same as classification or estimation, except that the records
are classified according to some predicted future behavior or estimated future
value. In a prediction task, the only way to check the accuracy of the classifi­
cation is to wait and see. The primary reason for treating prediction as a sepa­
rate task from classification and estimation is that in predictive modeling there
are additional issues regarding the temporal relationship of the input variables
or predictors to the target variable.
Any of the techniques used for classification and estimation can be adapted
for use in prediction by using training examples where the value of the vari­
able to be predicted is already known, along with historical data for those
examples. The historical data is used to build a model that explains the current
observed behavior. When this model is applied to current inputs, the result is
a prediction of future behavior.
Examples of prediction tasks addressed by the data mining techniques dis­
cussed in this book include:
Predicting the size of the balance that will be transferred if a credit card
––

prospect accepts a balance transfer offer
Predicting which customers will leave within the next 6 months
––

Predicting which telephone subscribers will order a value-added ser­
––

vice such as three-way calling or voice mail
Most of the data mining techniques discussed in this book are suitable for
use in prediction so long as training data is available in the proper form. The
Why and What Is Data Mining? 11


choice of technique depends on the nature of the input data, the type of value
to be predicted, and the importance attached to explicability of the prediction.


Affinity Grouping or Association Rules
The task of affinity grouping is to determine which things go together. The
prototypical example is determining what things go together in a shopping
cart at the supermarket, the task at the heart of market basket analysis. Retail
chains can use affinity grouping to plan the arrangement of items on store
shelves or in a catalog so that items often purchased together will be seen
together.
Affinity grouping can also be used to identify cross-selling opportunities
and to design attractive packages or groupings of product and services.
Affinity grouping is one simple approach to generating rules from data. If
two items, say cat food and kitty litter, occur together frequently enough, we
can generate two association rules:
People who buy cat food also buy kitty litter with probability P1.
––

People who buy kitty litter also buy cat food with probability P2.
––


Association rules are discussed in detail in Chapter 9.


Clustering

<<

. 8
( 137 .)



>>