. 27
( 137 .)


updated. When the visitor eventually becomes a customer or registered user,
the activity that led up to that transition becomes part of the customer record.
Tracking responses and responders is good practice in the offline world as
well. The first critical piece of information to record is the fact that the prospect
responded at all. Data describing who responded and who did not is a necessary
ingredient of future response models. Whenever possible, the response data
should also include the marketing action that stimulated the response, the chan­
nel through which the response was captured, and when the response came in.
Determining which of many marketing messages stimulated the response
can be tricky. In some cases, it may not even be possible. To make the job eas­
ier, response forms and catalogs include identifying codes. Web site visits cap­
ture the referring link. Even advertising campaigns can be distinguished by
using different telephone numbers, post office boxes, or Web addresses.
Depending on the nature of the product or service, responders may be
required to provide additional information on an application or enrollment
form. If the service involves an extension of credit, credit bureau information
may be requested. Information collected at the beginning of the customer rela­
tionship ranges from nothing at all to the complete medical examination some­
times required for a life insurance policy. Most companies are somewhere in

Gather Information from New Customers
When a prospect first becomes a customer, there is a golden opportunity to
gather more information. Before the transformation from prospect to cus­
tomer, any data about prospects tends to be geographic and demographic.
Purchased lists are unlikely to provide anything beyond name, contact infor­
mation, and list source. When an address is available, it is possible to infer
other things about prospects based on characteristics of their neighborhoods.
Name and address together can be used to purchase household-level informa­
tion about prospects from providers of marketing data. This sort of data is use­
ful for targeting broad, general segments such as “young mothers” or “urban
teenagers” but is not detailed enough to form the basis of an individualized
customer relationship.
110 Chapter 4

Among the most useful fields that can be collected for future data mining
are the initial purchase date, initial acquisition channel, offer responded to, ini­
tial product, initial credit score, time to respond, and geographic location. We
have found these fields to be predictive a wide range of outcomes of interest
such as expected duration of the relationship, bad debt, and additional
purchases. These initial values should be maintained as is, rather than being
overwritten with new values as the customer relationship develops.

Acquisition-Time Variables Can Predict Future Outcomes
By recording everything that was known about a customer at the time of
acquisition and then tracking customers over time, businesses can use data
mining to relate acquisition-time variables to future outcomes such as cus­
tomer longevity, customer value, and default risk. This information can then
be used to guide marketing efforts by focusing on the channels and messages
that produce the best results. For example, the survival analysis techniques
described in Chapter 12 can be used to establish the mean customer lifetime
for each channel. It is not uncommon to discover that some channels yield cus­
tomers that last twice as long as the customers from other channels. Assuming
that a customer™s value per month can be estimated, this translates into an
actual dollar figure for how much more valuable a typical channel A customer
is than a typical channel B customer”a figure that is as valuable as the cost-
per-response measures often used to rate channels.

Data Mining for Customer Relationship
Customer relationship management naturally focuses on established cus­
tomers. Happily, established customers are the richest source of data for min­
ing. Best of all, the data generated by established customers reflects their
actual individual behavior. Does the customer pay bills on time? Check or
credit card? When was the last purchase? What product was purchased? How
much did it cost? How many times has the customer called customer service?
How many times have we called the customer? What shipping method does
the customer use most often? How many times has the customer returned a
purchase? This kind of behavioral data can be used to evaluate customers™
potential value, assess the risk that they will end the relationship, assess the
risk that they will stop paying their bills, and anticipate their future needs.

Matching Campaigns to Customers
The same response model scores that are used to optimize the budget for a
mailing to prospects are even more useful with existing customers where they
Data Mining Applications 111

can be used to tailor the mix of marketing messages that a company directs to
its existing customers. Marketing does not stop once customers have been
acquired. There are cross-sell campaigns, up-sell campaigns, usage stimula­
tion campaigns, loyalty programs, and so on. These campaigns can be thought
of as competing for access to customers.
When each campaign is considered in isolation, and all customers are given
response scores for every campaign, what typically happens is that a similar
group of customers gets high scores for many of the campaigns. Some cus­
tomers are just more responsive than others, a fact that is reflected in the model
scores. This approach leads to poor customer relationship management. The
high-scoring group is bombarded with messages and becomes irritated and
unresponsive. Meanwhile, other customers never hear from the company and
so are not encouraged to expand their relationships.
An alternative is to send a limited number of messages to each customer,
using the scores to decide which messages are most appropriate for each one.
Even a customer with low scores for every offer has higher scores for some
then others. In Mastering Data Mining (Wiley, 1999), we describe how this
system has been used to personalize a banking Web site by highlighting the
products and services most likely to be of interest to each customer based on
their banking behavior.

Segmenting the Customer Base
Customer segmentation is a popular application of data mining with estab­
lished customers. The purpose of segmentation is to tailor products, services,
and marketing messages to each segment. Customer segments have tradition­
ally been based on market research and demographics. There might be a
“young and single” segment or a “loyal entrenched segment.” The problem
with segments based on market research is that it is hard to know how to
apply them to all the customers who were not part of the survey. The problem
with customer segments based on demographics is that not all “young and
singles” or “empty nesters” actually have the tastes and product affinities
ascribed to their segment. The data mining approach is to identify behavioral

Finding Behavioral Segments
One way to find behavioral segments is to use the undirected clustering tech­
niques described in Chapter 11. This method leads to clusters of similar
customers but it may be hard to understand how these clusters relate to the
business. In Chapter 2, there is an example of a bank successfully using auto­
matic cluster detection to identify a segment of small business customers that
were good prospects for home equity credit lines. However, that was only one
of 14 clusters found and others did not have obvious marketing uses.
112 Chapter 4

More typically, a business would like to perform a segmentation that places
every customer into some easily described segment. Often, these segments are
built with respect to a marketing goal such as subscription renewal or high
spending levels. Decision tree techniques described in Chapter 6 are ideal for
this sort of segmentation.
Another common case is when there are preexisting segment definition that
are based on customer behavior and the data mining challenge is to identify
patterns in the data that correspond to the segments. A good example is the
grouping of credit card customers into segments such as “high balance
revolvers” or “high volume transactors.”
One very interesting application of data mining to the task of finding pat­
terns corresponding to predefined customer segments is the system that AT&T
Long Distance uses to decide whether a phone is likely to be used for business


AT&T views anyone in the United States who has a phone and is not already
a customer as a potential customer. For marketing purposes, they have long
maintained a list of phone numbers called the Universe List. This is as com­
plete as possible a list of U.S. phone numbers for both AT&T and non-AT&T
customers flagged as either business or residence. The original method of
obtaining non-AT&T customers was to buy directories from local phone com­

panies, and search for numbers that were not on the AT&T customer list. This
was both costly and unreliable and likely to become more so as the companies
supplying the directories competed more and more directly with AT&T. The
original way of determining whether a number was a home or business was to
call and ask.
In 1995, Corina Cortes and Daryl Pregibon, researchers at Bell Labs (then a
part of AT&T) came up with a better way. AT&T, like other phone companies,
collects call detail data on every call that traverses its network (they are legally
mandated to keep this information for a certain period of time). Many of these
calls are either made or received by noncustomers. The telephone numbers of
non-customers appear in the call detail data when they dial AT&T 800 num­
bers and when they receive calls from AT&T customers. These records can be
analyzed and scored for likelihood to be businesses based on a statistical
model of businesslike behavior derived from data generated by known busi­
nesses. This score, which AT&T calls “bizocity,” is used to determine which
services should be marketed to the prospects.
Every telephone number is scored every day. AT&T™s switches process
several hundred million calls each day, representing about 65 million distinct
phone numbers. Over the course of a month, they see over 300 million
distinct phone numbers. Each of those numbers is given a small profile that
includes the number of days since the number was last seen, the average daily
minutes of use, the average time between appearances of the number on the
network, and the bizocity score.

Data Mining Applications 113

The bizocity score is generated by a regression model that takes into account
the length of calls made and received by the number, the time of day that call­
ing peaks, and the proportion of calls the number makes to known businesses.
Each day™s new data adjusts the score. In practice, the score is a weighted aver­
age over time with the most recent data counting the most.
Bizocity can be combined with other information in order to address partic­
ular business segments. One segment of particular interest in the past is home
businesses. These are often not recognized as businesses even by the local
phone company that issued the number. A phone number with high bizocity
that is at a residential address or one that has been flagged as residential by the
local phone company is a good candidate for services aimed at people who
work at home.

Tying Market Research Segments to Behavioral Data
One of the big challenges with traditional survey-based market research is that
it provides a lot of information about a few customers. However, to use the
results of market research effectively often requires understanding the charac­
teristics of all customers. That is, market research may find interesting seg­
ments of customers. These then need to be projected onto the existing customer
base using available data. Behavioral data can be particularly useful for this;
such behavioral data is typically summarized from transaction and billing his­
tories. One requirement of the market research is that customers need to be
identified so the behavior of the market research participants is known.
Most of the directed data mining techniques discussed in this book can be
used to build a classification model to assign people to segments based on
available data. All that is needed is a training set of customers who have
already been classified. How well this works depends largely on the extent to
which the customer segments are actually supported by customer behavior.

Reducing Exposure to Credit Risk
Learning to avoid bad customers (and noticing when good customers are
about to turn bad) is as important as holding on to good customers. Most
companies whose business exposes them to consumer credit risk do credit
screening of customers as part of the acquisition process, but risk modeling
does not end once the customer has been acquired.

Predicting Who Will Default
Assessing the credit risk on existing customers is a problem for any business
that provides a service that customers pay for in arrears. There is always the
chance that some customers will receive the service and then fail to pay for it.
114 Chapter 4

Nonrepayment of debt is one obvious example; newspapers subscriptions,
telephone service, gas and electricity, and cable service are among the many
services that are usually paid for only after they have been used.
Of course, customers who fail to pay for long enough are eventually cut off.
By that time they may owe large sums of money that must be written off. With
early warning from a predictive model, a company can take steps to protect
itself. These steps might include limiting access to the service or decreasing the
length of time between a payment being late and the service being cut off.
Involuntary churn, as termination of services for nonpayment is sometimes
called, can be modeled in multiple ways. Often, involuntary churn is consid­
ered as a binary outcome in some fixed amount of time, in which case tech­
niques such as logistic regression and decision trees are appropriate. Chapter
12 shows how this problem can also be viewed as a survival analysis problem,
in effect changing the question from “Will the customer fail to pay next
month?” to “How long will it be until half the customers have been lost to
involuntary churn?”
One of the big differences between voluntary churn and involuntary churn
is that involuntary churn often involves complicated business processes, as
bills go through different stages of being late. Over time, companies may
tweak the rules that guide the processes to control the amount of money that
they are owed. When looking for accurate numbers in the near term, modeling
each step in the business processes may be the best approach.

Improving Collections
Once customers have stopped paying, data mining can aid in collections.
Models are used to forecast the amount that can be collected and, in some
cases, to help choose the collection strategy. Collections is basically a type of
sales. The company tries to sell its delinquent customers on the idea of paying
its bills instead of some other bill. As with any sales campaign, some prospec­
tive payers will be more receptive to one type of message and some to another.

Determining Customer Value
Customer value calculations are quite complex and although data mining has
a role to play, customer value calculations are largely a matter of getting finan­


. 27
( 137 .)