. 28
( 137 .)


cial definitions right. A seemingly simple statement of customer value is the
total revenue due to the customer minus the total cost of maintaining the cus­
tomer. But how much revenue should be attributed to a customer? Is it what
he or she has spent in total to date? What he or she spent this month? What we
expect him or her to spend over the next year? How should indirect revenues
such as advertising revenue and list rental be allocated to customers?
Data Mining Applications 115

Costs are even more problematic. Businesses have all sorts of costs that may
be allocated to customers in peculiar ways. Even ignoring allocated costs and
looking only at direct costs, things can still be pretty confusing. Is it fair to
blame customers for costs over which they have no control? Two Web cus­
tomers order the exact same merchandise and both are promised free delivery.
The one that lives farther from the warehouse may cost more in shipping, but
is she really a less valuable customer? What if the next order ships from a dif­
ferent location? Mobile phone service providers are faced with a similar prob­
lem. Most now advertise uniform nationwide rates. The providers™ costs are
far from uniform when they do not own the entire network. Some of the calls
travel over the company™s own network. Others travel over the networks of
competitors who charge high rates. Can the company increase customer value
by trying to discourage customers from visiting certain geographic areas?
Once all of these problems have been sorted out, and a company has agreed
on a definition of retrospective customer value, data mining comes into play in
order to estimate prospective customer value. This comes down to estimating
the revenue a customer will bring in per unit time and then estimating the cus-
tomer™s remaining lifetime. The second of these problems is the subject of
Chapter 12.

Cross-selling, Up-selling, and Making Recommendations
With existing customers, a major focus of customer relationship management
is increasing customer profitability through cross-selling and up-selling. Data
mining is used for figuring out what to offer to whom and when to offer it.

Finding the Right Time for an Offer
Charles Schwab, the investment company, discovered that customers gener­
ally open accounts with a few thousand dollars even if they have considerably
more stashed away in savings and investment accounts. Naturally, Schwab
would like to attract some of those other balances. By analyzing historical
data, they discovered that customers who transferred large balances into
investment accounts usually did so during the first few months after they
opened their first account. After a few months, there was little return on trying
to get customers to move in large balances. The window was closed. As a
results of learning this, Schwab shifted its strategy from sending a constant
stream of solicitations throughout the customer life cycle to concentrated
efforts during the first few months.
A major newspaper with both daily and Sunday subscriptions noticed a
similar pattern. If a Sunday subscriber upgrades to daily and Sunday, it usu­
ally happens early in the relationship. A customer who has been happy with
just the Sunday paper for years is much less likely to change his or her habits.
116 Chapter 4

Making Recommendations
One approach to cross-selling makes use of association rules, the subject of
Chapter 9. Association rules are used to find clusters of products that usually
sell together or tend to be purchased by the same person over time. Customers
who have purchased some, but not all of the members of a cluster are good
prospects for the missing elements. This approach works for retail products
where there are many such clusters to be found, but is less effective in areas
such as financial services where there are fewer products and many customers
have a similar mix, and the mix is often determined by product bundling and
previous marketing efforts.

Retention and Churn
Customer attrition is an important issue for any company, and it is especially
important in mature industries where the initial period of exponential growth
has been left behind. Not surprisingly, churn (or, to look on the bright side,
retention) is a major application of data mining. We use the term churn as it is
generally used in the telephone industry to refer to all types of customer attri­
tion whether voluntary or involuntary; churn is a useful word because it is one
syllable and easily used as both a noun and a verb.

Recognizing Churn
One of the first challenges in modeling churn is deciding what it is and recog­
nizing when it has occurred. This is harder in some industries than in others.
At one extreme are businesses that deal in anonymous cash transactions.
When a once loyal customer deserts his regular coffee bar for another down
the block, the barista who knew the customer™s order by heart may notice,
but the fact will not be recorded in any corporate database. Even in cases
where the customer is identified by name, it may be hard to tell the difference
between a customer who has churned and one who just hasn™t been around for
a while. If a loyal Ford customer who buys a new F150 pickup every 5 years
hasn™t bought one for 6 years, can we conclude that he has defected to another
Churn is a bit easier to spot when there is a monthly billing relationship, as
with credit cards. Even there, however, attrition might be silent. A customer
stops using the credit card, but doesn™t actually cancel it. Churn is easiest to
define in subscription-based businesses, and partly for that reason, churn
modeling is most popular in these businesses. Long-distance companies,
mobile phone service providers, insurance companies, cable companies, finan­
cial services companies, Internet service providers, newspapers, magazines,
Data Mining Applications 117

and some retailers all share a subscription model where customers have a for­
mal, contractual relationship which must be explicitly ended.

Why Churn Matters
Churn is important because lost customers must be replaced by new cus­
tomers, and new customers are expensive to acquire and generally generate
less revenue in the near term than established customers. This is especially
true in mature industries where the market is fairly saturated”anyone likely
to want the product or service probably already has it from somewhere, so the
main source of new customers is people leaving a competitor.
Figure 4.6 illustrates that as the market becomes saturated and the response
rate to acquisition campaigns goes down, the cost of acquiring new customers
goes up. The chart shows how much each new customer costs for a direct mail
acquisition campaign given that the mailing costs $1 and it includes an offer of
$20 in some form, such as a coupon or a reduced interest rate on a credit card.
When the response rate to the acquisition campaign is high, such as 5 percent,
the cost of a new customer is $40. (It costs $100 dollars to reach 100 people, five
of whom respond at a cost of $20 dollars each. So, five new customers cost $200
dollars.) As the response rate drops, the cost increases rapidly. By the time the
response rate drops to 1 percent, each new customer costs $200. At some point,
it makes sense to spend that money holding on to existing customers rather
than attracting new ones.


Cost per Response




1.0% 2.0% 3.0% 4.0% 5.0%
Response Rate

Figure 4.6 As the response rate to an acquisition campaign goes down, the cost per
customer acquired goes up.
118 Chapter 4

Retention campaigns can be very effective, but also very expensive. A mobile
phone company might offer an expensive new phone to customers who renew
a contract. A credit card company might lower the interest rate. The problem
with these offers is that any customer who is made the offer will accept it. Who
wouldn™t want a free phone or a lower interest rate? That means that many of
the people accepting the offer would have remained customers even without it.
The motivation for building churn models is to figure out who is most at risk
for attrition so as to make the retention offers to high-value customers who
might leave without the extra incentive.

Different Kinds of Churn
Actually, the discussion of why churn matters assumes that churn is voluntary.
Customers, of their own free will, decide to take their business elsewhere. This
type of attrition, known as voluntary churn, is actually only one of three possi­
bilities. The other two are involuntary churn and expected churn.
Involuntary churn, also known as forced attrition, occurs when the company,
rather than the customer, terminates the relationship”most commonly due to
unpaid bills. Expected churn occurs when the customer is no longer in the tar­
get market for a product. Babies get teeth and no longer need baby food. Work­
ers retire and no longer need retirement savings accounts. Families move away
and no longer need their old local newspaper delivered to their door.
It is important not to confuse the different types of churn, but easy to do so.
Consider two mobile phone customers in identical financial circumstances.
Due to some misfortune, neither can afford the mobile phone service any
more. Both call up to cancel. One reaches a customer service agent and is
recorded as voluntary churn. The other hangs up after ten minutes on hold
and continues to use the phone without paying the bill. The second customer
is recorded as forced churn. The underlying problem”lack of money”is the
same for both customers, so it is likely that they will both get similar scores.
The model cannot predict the difference in hold times experienced by the two
Companies that mistake forced churn for voluntary churn lose twice”once
when they spend money trying to retain customers who later go bad and again
in increased write-offs.
Predicting forced churn can also be dangerous. Because the treatment given
to customers who are not likely to pay their bills tends to be nasty”phone ser­
vice is suspended, late fees are increased, dunning letters are sent more
quickly. These remedies may alienate otherwise good customers and increase
the chance that they will churn voluntarily.
In many companies, voluntary churn and involuntary churn are the respon­
sibilities of different groups. Marketing is concerned with holding on to good
customers and finance is concerned with reducing exposure to bad customers.
Data Mining Applications 119

From a data mining point of view, it is better to address both voluntary and
involuntary churn together since all customers are at risk for both kinds of
churn to varying degrees.

Different Kinds of Churn Model
There are two basic approaches to modeling churn. The first treats churn as a
binary outcome and predicts which customers will leave and which will stay.
The second tries to estimate the customers™ remaining lifetime.

Predicting Who Will Leave
To model churn as a binary outcome, it is necessary to pick some time horizon.
If the question is “Who will leave tomorrow?” the answer is hardly anyone. If
the question is “Who will have left in 100 years?” the answer, in most busi­
nesses, is nearly everyone. Binary outcome churn models usually have a fairly
short time horizon such as 60 or 90 days. Of course, the horizon cannot be too
short or there will be no time to act on the model™s predictions.
Binary outcome churn models can be built with any of the usual tools for
classification including logistic regression, decision trees, and neural networks.
Historical data describing a customer population at one time is combined with
a flag showing whether the customers were still active at some later time. The
modeling task is to discriminate between those who left and those who stayed.
The outcome of a binary churn model is typically a score that can be used to
rank customers in order of their likelihood of churning. The most natural score
is simply the probability that the customer will leave within the time horizon
used for the model. Those with voluntary churn scores above a certain thresh­
old can be included in a retention program. Those with involuntary churn
scores above a certain threshold can be placed on a watch list.
Typically, the predictors of churn turn out to be a mixture of things that were
known about the customer at acquisition time, such as the acquisition channel
and initial credit class, and things that occurred during the customer relation­
ship such as problems with service, late payments, and unexpectedly high or
low bills. The first class of churn drivers provides information on how to lower
future churn by acquiring fewer churn-prone customers. The second class of
churn drivers provides insight into how to reduce the churn risk for customers
who are already present.

Predicting How Long Customers Will Stay
The second approach to churn modeling is the less common method, although
it has some attractive features. In this approach, the goal is to figure out
how much longer a customer is likely to stay. This approach provides more
120 Chapter 4

information than simply whether the customer is expected to leave within 90
days. Having an estimate of remaining customer tenure is a necessary ingredi­
ent for a customer lifetime value model. It can also be the basis for a customer
loyalty score that defines a loyal customer as one who will remain for a long
time in the future rather than one who has remained a long time up until now.
One approach to modeling customer longevity would be to take a snapshot
of the current customer population, along with data on what these customers
looked like when they were first acquired, and try to estimate customer tenure
directly by trying to determine what long-lived customers have in common
besides an early acquisition date. The problem with this approach, is that the
longer customers have been around, the more different market conditions were
back when they were acquired. Certainly it is not safe to assume that the char­
acteristics of someone who got a cellular subscription in 1990 are good predic­
tors of which of today™s new customers will keep their service for many years.
A better approach is to use survival analysis techniques that have been bor­
rowed and adapted from statistics. These techniques are associated with the
medical world where they are used to study patient survival rates after med­
ical interventions and the manufacturing world where they are used to study
the expected time to failure of manufactured components.
Survival analysis is explained in Chapter 12. The basic idea is to calculate for


. 28
( 137 .)