. 82
( 137 .)


("Risk" of Reactivating)
80% 8%

Hazard Probability
(Remain Deactivated)
70% 7%
60% 6%

40% 4%
30% 3%
20% 2%

10% 1%
0% 0%
0 30 60 90 120 150 180 210 240 270 300 330 360

Days after Deactivation
Figure 12.12 Survival curve (upper curve) and hazards (lower curve) for reactivation of
mobile telephone customers.

After 90 days, the hazards are practically zero”customers do not reactivate.
Once again, the business processes provide guidance. Telephone numbers are
reserved for 90 days after customers leave. Normally, when customers reacti­
vate, they want to keep the same telephone number. After 90 days, the number
may have been reassigned, and the customer would have to get a new tele­
phone number.
This discussion has glossed over the question of how new (reactivated)
customers were associated with the expired accounts. In this case, the analysis
used the telephone numbers in conjunction with an account ID. This pretty
much guaranteed that the match was accurate, since reactivated customers
retained their telephone numbers and billing information. This is very
conservative but works for finding reactivations. It does not work for finding
other types of winback, such as customers who are willing to cycle through
telephone numbers in order to get introductory discounts.
Another approach is to try to identify individuals over time, even when they
are on different accounts. For businesses that collect Social Security numbers or
driver™s license numbers as a regular part of their business, such identifying
numbers can connect accounts together over time. (Be aware that not everyone
who is asked to supply this kind of identifying information does so accurately.)
Sometimes matching names, addresses, telephone numbers, and/or credit
cards is sufficient for matching purposes. More often, this task is outsourced to
a company that assigns individual and household IDs, which then provide
the information needed to identify which new customers are really former cus­
tomers who have been won back.
Studying initial covariates adds even more information. In this case,
“initial” means whatever is known about the customer at the point of deactiva­
tion. This includes not only information such as initial product and promotion,
Hazard Functions and Survival Analysis in Marketing 415

but also customer behavior before deactivating. Are customers who complain a
lot more or less likely to reactivate? Customers who roam? Customers who pay
their bills late?
This example shows the use of hazards to understand a classic time-to-event
question. There are other questions of this genre amenable to survival analysis:
When customers start on a minimum pricing plan, how long will it be

before they upgrade to a premium plan?
When customers upgrade to a premium plan, how long will it be before

they downgrade?
What is the expected length of time between purchases for customers,

given past customer behavior and the fact that different customers have
different purchase periods?
One nice aspect of using survival analysis is that it is easy to ask about the
effects of different initial conditions”such as the number of times that a cus­
tomer has visited in the past. Using proportional hazards, it is possible to
determine which covariates have the most effect on the desired outcome,
including which interventions are most and least likely to work.

Another interesting application of survival analysis is forecasting the number
of customers into the future, or equivalently, the number of stops on a given
day in the future. In the aggregate, survival does a good job of estimating how
many customers will stick around for a given length of time.
There are two components to any such forecast. The first is a model of exist­
ing customers, which can take into account various covariates during the cus-
tomer™s life cycle. Such a model works by applying one or more survival
models to all customers. If a customer has survived for 100 days, then the
probability of stopping tomorrow is the hazard at day 100. To calculate the
chance of stopping the day after tomorrow, first assume that the customer
does not stop tomorrow and then does stop on day 101. This is the conditional
survival (one minus the hazard”the probability of not stopping) at day 100
times the hazard for day 101. Applying this to all customer tenures, it is possi­
ble to forecast stops of existing customers in the future.
Figure 12.13 shows such a forecast for stops for 1 month, developed by sur­
vival expert Will Potts. Also shown are the actual values observed during this
period. The survival-based forecast proves to be quite close to what is actually
happening. By the way, this particular survival estimate used a parametric
model on the hazards rather than empirical hazard rates; the model was able
to take into account the day of the week. This results in the weekly cycle of
stops evident in the graph.
416 Chapter 12

Number Actual Predicted
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
Day of Month
Figure 12.13 Survival analysis can also be used for forecasting customer stops.

The second component of a customer-level forecast is a bit more difficult to
calculate. This component is the effect of new customers on the forecast, and
the difficulty is not technical. The challenge is getting estimates for new starts.
Fortunately, there are often budget forecasts that contain new starts, some­
times broken down by product, channel, or geography. It is possible to refine
the survival models to take into account these effects. Of course, the forecast is
only as accurate as the budget. The upside, though, is that the forecast, based
on survival techniques, can be incorporated into the process of managing
actual levels against budgeted levels.
The combination of these components”stop forecasts for existing cus­
tomers and stop forecasts for new customers”makes it possible to develop
estimates of customer levels into the future. The authors have worked with
clients who have taken these forecasts forward years. Because the models for
new customers included the acquisition channel, the forecasting model made
it possible to optimize the future acquisition channel mix.

Hazards Changing over Time
One of the more difficult issues in survival analysis is whether the hazards
themselves are constant or whether they change over time. The assumption in
scientific studies is that hazards do not change. The goal of scientific survival
analysis is to obtain estimates of the “real” hazard in various situations.
Hazard Functions and Survival Analysis in Marketing 417

This assumption may or may not be true in marketing. Certainly, working
with this assumption, survival analysis has proven its worth with customer
data. However, it is interesting to consider the possibility that hazards may
be changing over time. In particular, if hazards do change, that gives some
insight into whether the market place and customers are getting better or
worse over time.
One approach to answering this question is to base hazards on customers
who are stopping rather than customers who are starting, especially, say, cus­
tomers who have stopped in each of the past few years. In other words, were
the hazards associated with customers who stopped last year significantly
different from the hazards associated with customers who stopped the previ­
ous year? Earlier, this chapter warned that calculating hazards for a set of cus­
tomers chosen by their stop date does not produce accurate hazards. How can
we overcome this problem?
There is a way to calculate these hazards, although this has not yet appeared
in standard statistical tools. This method uses time windows on the customers
to estimate the hazard probability. Remember the definition of the empirical
hazard probability: the number of customers who stopped at a particular time
divided by the number of customers who could have stopped at that time. Up
to now, all customers have been included in the calculation. The idea is to
restrict the customers only to those who could have stopped during the period
in question.
As an example, consider estimating the hazards based on customers who
stopped in 2003. Customers who stopped in 2003 were either active on the first
day of 2003 or were new customers during the year. In either case, customers
only contribute to the population count starting at whatever their tenure was
on the first day of 2003 (or 0 for new starts).
Let™s consider the calculation of the 1-day hazard probability. What is the
population of customers who could have stopped with 1 day of tenure and
also have the stop in 2003? Only customers that started between December 31,
2002 and December 30, 2003 could have a 1-day stop in 2003. So, the calcula­
tion of the 1-day hazard uses all stops in 2003 where the tenure was 1 day as
the total for stops. The population at risk consists of customers who started
between December 31, 2002 and December 30, 2003. As another example, the
365-day hazard would be based on a population count of customers who
started in 2002.
The result is an estimate of the hazards based on stops during a particular
period of time. For comparison purposes, survival proves to be more useful
than the hazards themselves. Figure 12.14 provides an example, showing that
survival is indeed decreasing over the course of several years. The changes in
survival are small. However, the calculations are based on hundreds of thou­
sands of customers and do represent a decline in customer quality.
418 Chapter 12




0 30 60 90 120 150 180 210 240 270 300 330 360
Days after Start

Figure 12.14 A time-window technique makes it possible to see changes in survival over

Lessons Learned
Hazards and survival analysis are designed for understanding customers.
This chapter introduced hazards as the conditional probability of a customer
leaving at a given point in time. This treatment of survival analysis is unortho­
dox in terms of statistics, which prefers an approach based on continuous rates
rather than discrete time probabilities. However, this treatment is more intu­
itive for analyzing customers.
Hazards are like an x-ray of the customer life cycle. The related idea of sur­
vival, which is the proportion of customers who survive up to a particular
point in time, makes it possible to compare different groups of customers and
to translate results into dollars and cents. When there are enough customers
(and usually there are), stratifying the customers by building a separate curve
for each group provides a good comparison. It is possible to use other mea­
sures, such as the survival at a particular point in time, the customer half-life,
and the average tenure, to better understand customers.
One of the key concepts in survival analysis is censoring. This means that
some customers are dropped from the analysis. The idea of censoring can be
extended to understand competing risks, such as voluntary versus forced
attrition. Censoring also makes it possible to discard certain outcomes, such as
a one-time boycott, without adversely biasing overall results.
Hazard Functions and Survival Analysis in Marketing 419

One of the most powerful aspects of hazards is the ability to determine
which factors, at the onset, are responsible for increasing or decreasing the
hazards. In addition to stratifying customers, there is another technique, Cox
proportional hazards regression, which has proven its worth since the 1970s
and continues to be extended and improved upon.
Survival analysis has many applications beyond measuring the probability
of customers leaving. It has been used for forecasting customer levels, as well
as for predicting other types of events during the customer life cycle. It is a
very powerful tool, seemingly designed specifically for understanding cus­
tomers and their life cycles.


. 82
( 137 .)