. 78
( 137 .)


1 2 3 4 5 6 7 8 9 10 11 12 13

Tenure (Years)

The parametric curves that fit a retention curve do not fit well beyond the range where
they are defined.

Of course, this illustration does not prove that a parametric approach will
not work. Perhaps there is some function out there that, with the right
parameters, would fit the observed retention curve very well and continue
working beyond the range used to define the parameters. However, this
example does illustrate the challenges of using a parametric approach for
approximating survival curves directly, and it is consistent with our experience
even when using more data points. Functions that provide a good fit to the
retention curve turn out to diverge pretty quickly.

Another way of describing this is that the customers who have been around
for 1 year are going to behave just like new customers. Consider a group of 100
customers of various tenures, 50 leave in the following year, regardless of the
tenure of the customers at the beginning of the year”exponential decay says
that half are going to leave regardless of their initial tenure. That means that
customers who have been around for a while are no more loyal then newer cus­
tomers. However, it is often the case that customers who have been around for
a while are actually better customers than new customers. For whatever reason,
longer tenured customers have stuck around in the past and are probably a bit
less likely than new customers to leave in the future. Exponential decay is a bad
situation, because it assumes the opposite: that the tenure of the customer rela­
tionship has no effect on the rate that customers are leaving (the worst-case sce­
nario would have longer term customers leaving at consistently higher rates
than newer customers, the “familiarity breeds contempt” scenario).
394 Chapter 12


The preceding discussion on retention curves serves to show how useful reten­
tion curves are. These curves are quite simple to understand, but only in terms
of their data. There is no general shape, no parametric form, no grand theory
of customer decay. The data is the message.
Hazard probabilities extend this idea. As discussed here, they are an exam­
ple of a nonparametric statistical approach”letting the data speak instead of
finding a special function to speak for it. Empirical hazard probabilities simply
let the historical data determine what is likely to happen, without trying to fit
data to some preconceived form. They also provide insight into customer
retention and make it possible to produce a refinement of retention curves
called survival curves.

The Basic Idea
A hazard probability answers the following question:
Assume that a customer has survived for a certain length of time, so the cus-
tomer™s tenure is t. What is the probability that the customer leaves before t+1?
Another way to phrase this is: the hazard at time t is the risk of losing
customers between time t and time t+1. As we discuss hazards in more detail,
it may sometimes be useful to refer to this definition. As with many seemingly
simple ideas, hazards have significant consequences.
To provide an example of hazards, let™s step outside the world of business
for a moment and consider life tables, which describe the probability of
someone dying at a particular age. Table 12.1 shows this data, for the U.S. pop­
ulation in 2000:

Table 12.1 Hazards for Mortality in the United States in 2000, Shown as a Life Table

0“1 yrs 0.73%

1“4 yrs 0.03%

5“9 yrs 0.02%

10“14 yrs 0.02%

15“19 yrs 0.07%

20“24 yrs 0.10%

25“29 yrs 0.10%

30“34 yrs 0.12%
Hazard Functions and Survival Analysis in Marketing 395

Table 12.1 (continued)


35“39 yrs 0.16%

40“44 yrs 0.24%

45“49 yrs 0.36%

50“54 yrs 0.52%

55“59 yrs 0.80%

60“64 yrs 1.26%

65“69 yrs 1.93%

70“74 yrs 2.97%

75“79 yrs 4.56%

80“84 yrs 7.40%

85+ yrs 15.32%

A life table is a good example of hazards. Infants have about a 1 in 137
chance of dying before their first birthday. (This is actually a very good rate; in
less-developed countries the rate can be many times higher.) The mortality
rate then plummets, but eventually it climbs steadily higher. Not until some­
one is about 55 years old does the risk rise as high as it is during the first year.
This is a characteristic shape of some hazard functions and is called the bathtub
shape. The hazards start high, remain low for a long time, and then gradually
increase again. Figure 12.5 illustrates the bathtub shape using this data.







0-1 yrs

1-4 yrs

5-9 yrs

10-14 yrs

15-19 yrs

20-24 yrs

25-29 yrs

30-34 yrs

35-39 yrs

40-44 yrs

45-49 yrs

50-54 yrs

55-59 yrs

60-64 yrs

65-69 yrs

70-74 yrs

Age (Years)
Figure 12.5 The shape of a bathtub-shaped hazard function starts high, plummets, and then
gradually increases again.
396 Chapter 12

The same idea can be applied to customer tenure, although customer haz­
ards are more typically calculated by day, week, or month instead of by year.
Calculating a hazard for a given tenure t requires only two pieces of data. The
first is the number of customers who stopped at time t (or between t and t+1).
The second is the total number of customers who could have stopped during
this period, also called the population at risk. This consists of all customers
whose tenure is greater than or equal to t, including those who stopped at time
t. The hazard probability is the ratio of these two numbers, and being a proba­
bility, the hazard is always between 0 and 1. These hazard calculations are pro­
vided by life table functions in statistical software such as SAS and SPSS. It is
also possible to do the calculations in a spreadsheet using data directly from a
customer database.
One caveat: In order for the calculation to be accurate, every customer
included in the population count must have the opportunity to stop at that par­
ticular time. This is a property of the data used to calculate the hazards, rather
than the method of calculation. In most cases, this is not a problem, because haz­
ards are calculated from all customers or from some subset based on initial con­
ditions (such as initial product or campaign). There is no problem when a
customer is included in the population count up to that customer™s tenure, and
the customer could have stopped on any day before then and still be in the data set.
An example of what not to do is to take a subset of customers who have
stopped during some period of time, say in the past year. What is the problem?
Consider a customer who stopped yesterday with 2 years of tenure. This cus­
tomer is included in all the population counts for the first year of hazards.
However, the customer could not have stopped during the first year of tenure.
The stop would have been more than a year in the past and precluded the
customer from being in the data set. Because customers who could not have
stopped are included in the population counts, the population counts are too
big making the initial hazards too low. Later in the chapter, an alternative
method is explained to address this issue.

WA R N I N G To get accurate hazards and survival curves, use groups of
customers who are defined only based on initial conditions. In particular, do

not define the group based on how or when the members left.

When populations are large, there is no need to worry about statistical
ideas such as confidence and standard error. However, when the populations
are small”as they are in medical research studies or in some business
applications”then the confidence interval may become an issue. What this
means is that a hazard of say 5 percent might really be somewhere between 4
percent and 6 percent. When working with smallish populations (say less than
a few thousand), it might be a good idea to use statistical methods that provide
Hazard Functions and Survival Analysis in Marketing 397

information about standard errors. For most applications, though, this is not
an important concern.

Examples of Hazard Functions
At this point, it is worth stopping and looking at some examples of hazards.
These examples are intended to help in understanding what is happening, by
looking at the hazard probabilities. The first two examples are basic, and, in
fact, we have already seen examples of them in this chapter. The third is from
real-world data, and it gives a good flavor of how hazards can be used to
provide an x-ray of customers™ lifetimes.

Constant Hazard
The constant hazard hardly needs a picture to explain it. What it says is that
the hazard of customers leaving is exactly the same, no matter how long the
customers have been around. This looks like a horizontal line on a graph.
Say the hazard is being measured by days, and it is a constant 0.1 percent.
That is, one customer out of every thousand leaves every day. After a year (365
days), this means that about 30.6 percent of the customers have left. It takes
about 692 days for half the customers to leave. It will take another 692 days for
half of them to leave. And so on, and so on.
The constant hazard means the chance of a customer leaving does not vary
with the length of time the customer has been around. This sounds a lot like
the exponential retention curve, the one that looks like the decay of radioactive
elements. In fact, a constant retention hazard would conform to an exponential
form for the retention curve. We say “would” simply because, although this
does happen in physics, it does not happen much in marketing.

Bathtub Hazard
The life table for the U.S. population provided an example of the bathtub-
shaped hazard function. This is common in the life sciences, although bathtub
shaped curves turn up in other domains. As mentioned earlier, the bathtub haz­
ard initially starts out quite high, then it goes down and flattens out for a long
time, and finally, the hazards increase again.
One phenomenon that causes this is when customers are on contracts (for
instance, for cell phones or ISP services), typically for 1 year or longer. Early in
the contract, customers stop because the service is not appropriate or because
they do not pay. During the period of the contract, customers are dissuaded
from canceling, either because of the threat of financial penalties or perhaps
only because of a feeling of obligation to honor the terms of the initial contract.


. 78
( 137 .)