. 120
( 137 .)



Profiling and description


One approach to the business goal of improving retention is to identify the
subscribers who are likely to cancel, figure out why, and make them some kind
of offer that addresses their concerns. For the strategy to be successful, sub­
scribers who are likely to cancel must be identified and assigned to groups
according to their presumed reasons for leaving. An appropriate retention
offer can then be designed for each group.
Using a model set that contains examples of customers who have canceled
along with examples of those who have not, many of the data mining tech­
niques discussed in this book are capable of labeling each customer as more or
606 Chapter 18

less likely to churn. The additional requirement to identify separate segments
of subscribers at risk and understand what motivates each group to leave sug­
gests the use of decision trees and clever derived variables.
Each leaf of the decision tree has a label, which in this case would be “not
likely to churn” or “likely to churn.” Each leaf in the tree has different propor­
tions of the target variables; this proportion of churners that can be used as a
churn score. Each leaf also has a set of rules describing who ends up there. With
skill and creativity, an analyst may be able to turn these mechanistic rules into
comprehensible reasons for leaving that, once understood, can be counteracted.
Decision trees often have more leaves than desired for the purpose of develop­
ing special offers and telemarketing scripts. To combine leaves, into larger
groups, take whole branches of the tree as the groups, rather than single leaves.
Note that our preference for decision-tree methods in this case stems from
the desire to understand the reasons for attrition and our desire to treat sub­
groups differentially. If the goal were simply to do the best possible job of pre­
dicting the subscribers at risk, without worrying about the reasons, we might
select a different approach. Different business goals suggest different data
mining techniques. If the goal were to estimate next month™s minutes of use for
each subscriber, neural networks or regression would be better choices. If the
goal were to find naturally occurring customer segments an undirected clus­
tering technique or profiling and hypothesis testing would be appropriate.

Determine the Relevant Characteristics of the Data
Once the data mining tasks have been identified and used to narrow the range
of data mining methods under consideration, the characteristics of the avail­
able data can help to refine the selection further. In general terms, the goal is to
select the data mining technique that minimizes the number and difficulty of
the data transformations that must be performed in order to coax good results
from the data.
As discussed in the previous chapter, some amount of data transformation
is always part of the data mining process. The raw data may need to be sum­
marized in various ways, data encodings must be rationalized, and so forth.
These kinds of transformations are necessary regardless of the technique cho­
sen. However, some kinds of data pose particular problems for some data min­
ing techniques.

Data Type
Categorical variables are especially problematic for data mining techniques
that use the numeric values of input variables. Numeric variables of the kind
that can be summed and multiplied play to the strengths of data mining tech­
niques, such as regression, K-means clustering, and neural networks, that are
Putting Data Mining to Work 607

based on arithmetic operations. When data has many categorical variables,
then decision trees are quite useful, although association rules and link analy­
sis may be appropriate in some cases.

Number of Input Fields
In directed data mining applications, there should be a single target field or
dependent variable. The rest of the fields (except for those that are either
clearly irrelevant or clearly dependent on the target variable) are treated as
potential inputs to the model. Data mining methods vary in their ability to suc­
cessfully process large numbers of input fields. This can be a factor in deciding
on the right technique for a particular application.
In general, techniques that rely on adjusting a vector of weights that has an
element for each input field run into trouble when the number of fields grows
very large. Neural networks and memory-based reasoning share that trait.
Association rules run into a different problem. The technique looks at all pos­
sible combinations of the inputs; as the number of inputs grows, processing
the combinations becomes impossible to do in a reasonable amount of time.
Decision-tree methods are much less hindered by large numbers of fields.
As the tree is built, the decision-tree algorithm identifies the single field that
contributes the most information at each node and bases the next segment of
the rule on that field alone. Dozens or hundreds of other fields can come along
for the ride, but won™t be represented in the final rules unless they contribute
to the solution.

T I P When faced with a large number of fields for a directed data mining
problem, it is a good idea to start by building a decision tree, even if the final
model is to be built using a different technique. The decision tree will identify a
good subset of the fields to use as input to a another technique that might be
swamped by the original set of input variables.

Free-Form Text
Most data mining techniques are incapable of directly handling free-form text.
But clearly, text fields often contain extremely valuable information. When
analyzing warranty claims submitted to an engine manufacturer by indepen­
dent dealers, the mechanic™s free-form notes explaining what went wrong and
what was done to fix the problem are at least as valuable as the fixed fields that
show the part numbers and hours of labor used.
One data mining technique that can deal with free text is memory-based
reasoning, one of the nearest neighbor methods discussed in Chapter 8. Recall
that memory-based reasoning is based on the ability to measure the distance
608 Chapter 18

from one record to all the other records in a database in order to form a neigh­
borhood of similar records. Often, finding an appropriate distance metric is a
stumbling block that makes it hard to apply the technique, but researchers in
the field of information retrieval have come up with good measures of the dis­
tance between two blocks of text. These measurements are based on the over­
lap in vocabulary between the documents, especially of uncommon words and
proper nouns. The ability of Web search engines to find appropriate articles is
one familiar example of text mining.
As described in Chapter 8, memory-based reasoning on free-form text has
also been used to classify workers into industries and job categories based on
written job descriptions they supplied on the U.S. census long form and to add
keywords to news stories.

Consider Hybrid Approaches
Sometimes, a combination of techniques works better than any single approach.
This may require breaking down a single data mining task into two or more sub-
tasks. The automotive marketing example from Chapter 2 is a good example.
Researchers found that the best way of selecting prospects for a particular car
model was to first use a neural network to identify people likely to buy a car,
then use a decision tree to predict the particular model each car buyer would
Another example is a bank that uses three variables as input to a credit solic­
itation decision. The three inputs are estimates for:
The likelihood of a response

The projected first-year revenue from this customer

The risk of the new customer defaulting

These tasks vary considerably in the amount of relevant training data likely
to be available, the input fields likely to be important, and the length of time
required to verify the accuracy of a prediction. Soon after a mailing, the bank
knows exactly who responded because the solicitation contains a deadline
after which responses are considered invalid. A whole year must pass before
the estimated first-year revenue can be checked against the actual amount, and
it may take even longer for a customer to “go bad.” Given all these differences,
it is not be surprising that a different data mining techniques may turn out to
be best for each task.

How One Company Began Data Mining
Over the years, the authors have watched many companies make their first
forays into data mining. Although each company™s situation is unique, some
Putting Data Mining to Work 609

common themes emerge. At each company there was someone responsible for
the data mining project who truly believed in the power and potential of ana­
lytic customer relationship management, often because he or she had seen it in
action in other companies. This leader was not usually a technical expert, and
frequently did not do any of the actual technical work. He or she functioned as
an evangelist to build the data mining team and secure sponsorship for a data
mining pilot.
The successful efforts crossed corporate boundaries to involve people from
both marketing and information technology. The teams were usually quite
small”often consisting of only 4 or 5 people”yet included people who
understood the data, people who understood the data mining techniques, peo­
ple who understood the business problem to be addressed, and at least one
person with experience applying data mining to business problems. Some­
times several of these roles were combined in one person.
In all cases, the initial data mining pilot project addressed a problem of real
importance to the organization”one where the value of success would be rec­
ognized. Some of the best pilot projects were designed to measure the useful­
ness of data mining by looking at the results of the actions suggested by the
data mining effort.
One of the companies, a wireless service provider, agreed to let us describe
its data mining pilot project.

A Controlled Experiment in Retention
In 1996, Comcast Cellular was a wireless phone service provider in a market of
7.5 million people in a three-state area centered around Philadelphia. In 1999,
Comcast Cellular was absorbed by SBC and is now part of Cingular, but at the
time this pilot study took place, it was a regional service provider facing tough
competition from fast-growing national networks. Increasing competition
meant that subscribers were faced with many competing offers, and each
month a significant proportion of the customer base switched to a competing
service. This churn, as it is called in the industry, was very disturbing because
even though new subscribers easily outnumbered the defectors, the acquisi­
tion cost for a new customer was often in the $500 to $600 range. There is a
detailed discussion of churn in Chapter 4.
With even more competitors, poised to enter its market, Comcast Cellular
wanted to reach out to existing subscribers with a proactive effort to ensure
their continued happiness. The difficulty was knowing which customers were
at risk and for what reasons. For any retention campaign, it is important to
understand which customers are at risk because a retention offer costs the
company money. It doesn™t make sense to offer an inducement to customers
who are likely to remain anyway. It is equally important to understand what
motivates different customer segments to leave, since different retention offers
610 Chapter 18

are appropriate for different segments. An offer of free night and weekend
minutes may be very attractive to customers who use their phones primarily
to keep in touch with friends, but of little interest to business users.
The pilot project was a three-way partnership between Comcast, a group of
data mining consultants (including the authors), and a telemarketing service
Comcast supplied data and expertise on its own business practices and

The data mining consultants developed profiles of likely defectors

based on usage patterns in call detail data.
The telemarketing service bureau worked with Comcast to use the pro­

files to develop retention offers for an outbound telemarketing campaign.
This description focuses on the data mining aspect of the combined effort.
The goal of the data mining effort was to identify groups of subscribers with
an unusually high likelihood to cancel their subscriptions in the next 60 days.
The data mining tool employed used a rule induction algorithm similar to
decision trees to create segments of high-risk customers described by simple
rules. The plan was to include these high-risk customers in telemarketing cam­
paigns aimed at retaining them. The retention offers were to be tailored to dif­
ferent customer segments discovered through data mining. The experimental
design allowed for the comparison of three groups:
Group A consists of customers judged by the model to be high risk for

whom no intervention was performed.
Group B consists of customers judged by the model to be high risk for

whom some intervention was performed.
Group C is representative of the general population of customers.

The study design is illustrated in Figure 18.2. Our hope, of course, was that
group A would suffer high attrition compared to groups B and C, proving that
both the model and the intervention were effective.
Here the project ran into a little trouble. The first difficulty was that although
the project included a budget for outbound telemarketing calls to the people
identified as likely to cancel, there was neither budget nor authorization to
actually offer anything to the people being called. Another problem was a tech­
nical problem in the call center. It was not possible to transfer a dissatisfied cus­
tomer directly over to the customer service group at the phone company to
resolve particular problems outside the scope of the retention effort (such as
mistakes on bills). Yet another problem was that although the customer data­
base included a home phone number for each customer, only about 75 percent
of them turned out to be correct.
Putting Data Mining to Work 611

At Risk

Test Group


. 120
( 137 .)