. 81
( 137 .)


form proportional effect on hazards, he was able to figure out how to measure
this effect for different factors. The purpose of this section is to introduce
proportional hazards and to suggest how they are useful for understanding
customers. This section starts with some examples of why proportional
Hazard Functions and Survival Analysis in Marketing 409

hazards are useful. It then describes an alternative approach before returning
to the Cox model itself.

Examples of Proportional Hazards
Consider the following statement about one risk from smoking: The risk of
leukemia for smokers is 1.53 times greater than for nonsmokers. This result is a clas­
sic example of proportional hazards. At the time of the study, the researchers
knew whether someone was or was not a smoker (actually, there was a third
group of former smokers, but our purpose here is to illustrate an example).
Whether or not someone is a smoker is an example of an initial condition.
Since there are only two factors to consider, it is possible to just look at the haz­
ard curves and to derive some sort of average for the overall risk.
Figure 12.11 provides an illustration from the world of marketing. It shows
two sets of hazard probabilities, one for customers who joined from a tele­
phone solicitation and the other from direct mail. Once again, how someone
became a customer is an example of an initial condition. The hazards for the
telemarketing customers are higher; looking at the chart, we might say tele­
marketing customers are a bit less than twice as risky as direct mail customers.
Cox proportional hazard regression provides a way to quantify this.
The two just-mentioned examples use categorical variables as the risk factor.
Consider another statement about the risk of tobacco: The risk of colorectal
cancer increases 6.7 percent per pack-year smoked. This statement differs from the
previous one, because it now depends on a continuous variable. Using pro­
portional hazards, it is possible to determine the contribution of both categor­
ical and continuous covariates.








Direct Mail
0 10 20 30 40 50 60 70

Tenure (Weeks)
Figure 12.11 These two hazard functions suggest that the risk of attrition is about one and
a half times as great for customers acquired through telemarketing versus direct mail.
410 Chapter 12

Stratification: Measuring Initial Effects on Survival
Figure 12.11 showed hazard probabilities for two different groups of cus­
tomers, one that started via outbound telemarketing campaigns and the other
via direct mail campaigns. These two curves clearly show differences between
these channels. It is possible to generate a survival curve for these hazards and
quantify the difference, using 1-year survival, median survival, or average
truncated tenure. This approach to measuring differences among different
groups defined by initial conditions is called stratification because each group
is analyzed independently from other groups. This produces good visualiza­
tions and accurate survival values. It is also quite easy, since statistical pack­
ages such as SAS and SPSS have options that make it easy to stratify data for
this purpose.
Stratification solves the problem of understanding initial effects assuming
that two conditions are true. First, the initial effect needs to be a categorical
variable. Since the data is being broken into separate groups, some variable,
such as channel or product or region, needs to be chosen for this purpose. Of
course, it is always possible to use binning to break a continuous variable into
discrete chunks.
The second is that each group needs to be fairly big. When starting with lots
and lots of customers and only using one variable that takes on a handful of
values, such as channel, this is not a problem. However, there may be multiple
variables of interest, such as:
Acquisition channel

Original promotion


Once more than one dimension is included, the number of categories grows
very quickly. This means that the data gets spread thinly, making the hazards
less and less reliable.

Cox Proportional Hazards
In 1972, Sir David Cox recognized this problem and he proposed a method of
analysis, now known as Cox proportional hazards regression, which over­
comes these limitations. His brilliant insight was to find a way to focus on the
original conditions and not on the hazards themselves. The question is: What
effect do the initial conditions have on hazards? His approach to answering
this question is quite interesting.
Fortunately, the ideas are simpler than the mathematics behind his approach.
Instead of focusing on hazards, he introduces the idea of partial likelihood.
Assuming that only one customer stops at a given time t, the partial likelihood
at t is the likelihood that exactly that particular customer stopped.
Hazard Functions and Survival Analysis in Marketing 411

The calculation for the partial likelihood divides whatever function or value
represents the hazard for the specific customer that stopped by the sum of all
the hazards for all the customers who might have stopped at that time. If all
customers had the same hazard rates, then this ratio would be constant (one
divided by the population at that point in time). However, the hazards are not
constant and hopefully are some function of the initial conditions.
Cox made an assumption that the initial conditions have a constant effect on
all hazards, regardless of the time of the hazard. The partial likelihood is a
ratio, and the proportionality assumption means that the hazards, whatever
they are, appear in both the numerator and denominator multiplied by a com­
plicated expression based on the initial conditions. What is left is a compli­
cated mathematical formula containing the initial conditions. The hazards
themselves have disappeared from the partial likelihood; they simply cancel
each other out.
The next step is to apply the partial likelihoods of all customers who stop to
get the overall likelihood of those particular customers stopping. The product
of all these partial likelihoods is an expression that gives the likelihood of see­
ing exactly the particular set of stopped customers stopping when they did.
Conveniently, this likelihood is also expressed only terms of the initial condi­
tions and not in terms of the hazards, which may not be known.
Fortunately, there is an area of statistics called maximum likelihood estima­
tion, which when given a complicated expression for something like this finds
the parameter values that make the result most likely. These parameter values
conveniently represent the effect of the initial values on the hazards. As an
added bonus, the technique works both with continuous and categorical val­
ues, whereas the stratification approach only works with categorical values.

Limitations of Proportional Hazards
Cox proportional hazards regression is very powerful and very clever.
However, it has its limitations. In order for all this to work, Cox had to make
many assumptions. He designed his approach around continuous time haz­
ards and also made the assumption that only one customer stops at any given
time. With some tweaking, implementations of proportional hazards regres­
sion usually work for discrete time hazards and handle multiple stops at the
same time.

WA R N I N G Cox proportional hazards regression ranks and quantifies the
effects of initial conditions on the overall hazard function. However, the results
are highly dependent on the often dubious assumption that the initial
conditions have a constant effect on the hazards over time. Use it carefully.
412 Chapter 12

The biggest assumption in the proportional hazards model is the assump­
tion of proportionality itself. That is, that the effect of the initial conditions on
hazards does not have a time component. In practice, this is simply not true. It
is rarely, if ever, true that initial conditions have such perfect proportionality,
even in the scientific world. In the world of marketing, this is even less likely.
Marketing is not a controlled experiment. Things are constantly changing; new
programs, pricing, and competition are always arising.
The bad news is that there is no simple algorithm that explains initial condi­
tions, taking into account different effects over time. The good news is that it
often does not make a difference. Even with the assumption of proportionality,
Cox regression does a good job of determining which covariates have a big
impact on the hazards. In other words, it does a good job of explaining what
initial conditions are correlated with customers leaving.

Cox™s approach was designed only for time-zero covariates, as statisticians

call initial values. The approach has been extended to handle events that occur
during a customer™s lifetime”such as whether they upgrade their product or
make a complaint. In the language of statistics, these are time-dependent
covariates, meaning that the additional factors can occur at any point during
the customer™s tenure, not only at the beginning of the relationship. Such
factors might be a customer™s response to a retention campaign or making

complaints. Since Cox™s original work, he and other statisticians have
extended this technique to include these types of factors.

Survival Analysis in Practice
Survival analysis has proven to be very valuable for understanding customers
and quantifying marketing efforts in terms of customer retention. It provides a
way of estimating how long it will be until something occurs. This section
gives some particular examples of survival analysis.

Handling Different Types of Attrition
Businesses that deal with customers have to deal with customers leaving for a
variety of reasons. Earlier, this chapter described hazard probabilities and
explained how hazards illustrate aspects of the business that affect the customer
life cycle. In particular, peaks in hazards coincided with business processes that
forced out customers who were not paying their bills.
Since these customers are treated differently, it is tempting to remove them
entirely from the hazard calculation. This is the wrong approach. The problem
is, which customers to remove is only known after the customers have been

Hazard Functions and Survival Analysis in Marketing 413

forced to stop. As mentioned earlier, it is not a good idea to use such knowl­
edge, gained at the end of the customer relationship, to filter customers for
The right approach is to break this into two problems. What are the hazards
for voluntary attrition? What are the hazards for forced attrition? Each of these
uses all the customers, censoring the customers who leave due to other factors.
When calculating the hazards for voluntary attrition, whenever a customer is
forced to leave, the customer is included in the analysis until he or she leaves”
at that point, the customer is censored. This makes sense. Up to the point when
the customer was forced to leave, the customer did not leave voluntarily.
This approach can be extended for other purposes. Once upon a time, the
authors were trying to understand different groups of customers at a news­
paper, in particular, how survival by acquisition channel was or was not
changing over time. Unfortunately, during one of the time periods, there was
a boycott of the newspaper, raising the overall stop levels during that period.
Not surprisingly, the hazards went up and survival decreased during this time
Is there a way to take into account these particular stops? The answer is
“yes,” because the company did a pretty good job of recording the reasons
why customers stopped. The customers who boycotted the paper were simply
censored on the day they stopped”as they say in the medical world, these
customers were lost to follow-up. By censoring, it was possible to get an accu­
rate estimate of the overall hazards without the boycott.

When Will a Customer Come Back?
So far, the discussion of survival analysis has focused on the end of the customer
relationship. Survival analysis can be used for many things besides predicting
the probability of bad things happening. For instance, survival analysis can be
used to estimate when customers will return after having stopped.
Figure 12.12 shows a survival curve and hazards for reactivation of cus­
tomers after they deactivate their mobile telephone service. In this case, the
hazard is the probability that a customer returns a given number of days after
the deactivation.
There are several interesting features in these curves. First, the initial reacti­
vation rate is very high. In the first week, more than a third of customers reac­
tivate. Business rules explain this phenomenon. Many deactivations are due to
customers not paying their bills. Many of these customers are just holding out
until the last minute”they actually intend to keep their phones; they just
don™t like paying the bill. However, once the phone stops working, they
quickly pay up.
414 Chapter 12

100% 10%
90% 9%


. 81
( 137 .)