this effect for different factors. The purpose of this section is to introduce

proportional hazards and to suggest how they are useful for understanding

customers. This section starts with some examples of why proportional

Hazard Functions and Survival Analysis in Marketing 409

hazards are useful. It then describes an alternative approach before returning

to the Cox model itself.

Examples of Proportional Hazards

Consider the following statement about one risk from smoking: The risk of

leukemia for smokers is 1.53 times greater than for nonsmokers. This result is a clas

sic example of proportional hazards. At the time of the study, the researchers

knew whether someone was or was not a smoker (actually, there was a third

group of former smokers, but our purpose here is to illustrate an example).

Whether or not someone is a smoker is an example of an initial condition.

Since there are only two factors to consider, it is possible to just look at the haz

ard curves and to derive some sort of average for the overall risk.

Figure 12.11 provides an illustration from the world of marketing. It shows

two sets of hazard probabilities, one for customers who joined from a tele

phone solicitation and the other from direct mail. Once again, how someone

became a customer is an example of an initial condition. The hazards for the

telemarketing customers are higher; looking at the chart, we might say tele

marketing customers are a bit less than twice as risky as direct mail customers.

Cox proportional hazard regression provides a way to quantify this.

The two just-mentioned examples use categorical variables as the risk factor.

Consider another statement about the risk of tobacco: The risk of colorectal

cancer increases 6.7 percent per pack-year smoked. This statement differs from the

previous one, because it now depends on a continuous variable. Using pro

portional hazards, it is possible to determine the contribution of both categor

ical and continuous covariates.

20%

18%

16%

14%

12%

Hazard

10%

8%

Telemarketing

6%

Direct Mail

4%

2%

0%

0 10 20 30 40 50 60 70

Tenure (Weeks)

Figure 12.11 These two hazard functions suggest that the risk of attrition is about one and

a half times as great for customers acquired through telemarketing versus direct mail.

410 Chapter 12

Stratification: Measuring Initial Effects on Survival

Figure 12.11 showed hazard probabilities for two different groups of cus

tomers, one that started via outbound telemarketing campaigns and the other

via direct mail campaigns. These two curves clearly show differences between

these channels. It is possible to generate a survival curve for these hazards and

quantify the difference, using 1-year survival, median survival, or average

truncated tenure. This approach to measuring differences among different

groups defined by initial conditions is called stratification because each group

is analyzed independently from other groups. This produces good visualiza

tions and accurate survival values. It is also quite easy, since statistical pack

ages such as SAS and SPSS have options that make it easy to stratify data for

this purpose.

Stratification solves the problem of understanding initial effects assuming

that two conditions are true. First, the initial effect needs to be a categorical

variable. Since the data is being broken into separate groups, some variable,

such as channel or product or region, needs to be chosen for this purpose. Of

course, it is always possible to use binning to break a continuous variable into

discrete chunks.

The second is that each group needs to be fairly big. When starting with lots

and lots of customers and only using one variable that takes on a handful of

values, such as channel, this is not a problem. However, there may be multiple

variables of interest, such as:

Acquisition channel

––

Original promotion

––

Geography

––

Once more than one dimension is included, the number of categories grows

very quickly. This means that the data gets spread thinly, making the hazards

less and less reliable.

Cox Proportional Hazards

In 1972, Sir David Cox recognized this problem and he proposed a method of

analysis, now known as Cox proportional hazards regression, which over

comes these limitations. His brilliant insight was to find a way to focus on the

original conditions and not on the hazards themselves. The question is: What

effect do the initial conditions have on hazards? His approach to answering

this question is quite interesting.

Fortunately, the ideas are simpler than the mathematics behind his approach.

Instead of focusing on hazards, he introduces the idea of partial likelihood.

Assuming that only one customer stops at a given time t, the partial likelihood

at t is the likelihood that exactly that particular customer stopped.

Hazard Functions and Survival Analysis in Marketing 411

The calculation for the partial likelihood divides whatever function or value

represents the hazard for the specific customer that stopped by the sum of all

the hazards for all the customers who might have stopped at that time. If all

customers had the same hazard rates, then this ratio would be constant (one

divided by the population at that point in time). However, the hazards are not

constant and hopefully are some function of the initial conditions.

Cox made an assumption that the initial conditions have a constant effect on

all hazards, regardless of the time of the hazard. The partial likelihood is a

ratio, and the proportionality assumption means that the hazards, whatever

they are, appear in both the numerator and denominator multiplied by a com

plicated expression based on the initial conditions. What is left is a compli

cated mathematical formula containing the initial conditions. The hazards

themselves have disappeared from the partial likelihood; they simply cancel

each other out.

The next step is to apply the partial likelihoods of all customers who stop to

get the overall likelihood of those particular customers stopping. The product

of all these partial likelihoods is an expression that gives the likelihood of see

ing exactly the particular set of stopped customers stopping when they did.

Conveniently, this likelihood is also expressed only terms of the initial condi

tions and not in terms of the hazards, which may not be known.

Fortunately, there is an area of statistics called maximum likelihood estima

tion, which when given a complicated expression for something like this finds

the parameter values that make the result most likely. These parameter values

conveniently represent the effect of the initial values on the hazards. As an

added bonus, the technique works both with continuous and categorical val

ues, whereas the stratification approach only works with categorical values.

Limitations of Proportional Hazards

Cox proportional hazards regression is very powerful and very clever.

However, it has its limitations. In order for all this to work, Cox had to make

many assumptions. He designed his approach around continuous time haz

ards and also made the assumption that only one customer stops at any given

time. With some tweaking, implementations of proportional hazards regres

sion usually work for discrete time hazards and handle multiple stops at the

same time.

WA R N I N G Cox proportional hazards regression ranks and quantifies the

effects of initial conditions on the overall hazard function. However, the results

are highly dependent on the often dubious assumption that the initial

conditions have a constant effect on the hazards over time. Use it carefully.

412 Chapter 12

The biggest assumption in the proportional hazards model is the assump

tion of proportionality itself. That is, that the effect of the initial conditions on

hazards does not have a time component. In practice, this is simply not true. It

is rarely, if ever, true that initial conditions have such perfect proportionality,

even in the scientific world. In the world of marketing, this is even less likely.

Marketing is not a controlled experiment. Things are constantly changing; new

programs, pricing, and competition are always arising.

The bad news is that there is no simple algorithm that explains initial condi

tions, taking into account different effects over time. The good news is that it

often does not make a difference. Even with the assumption of proportionality,

Cox regression does a good job of determining which covariates have a big

impact on the hazards. In other words, it does a good job of explaining what

initial conditions are correlated with customers leaving.

Y

Cox™s approach was designed only for time-zero covariates, as statisticians

FL

call initial values. The approach has been extended to handle events that occur

during a customer™s lifetime”such as whether they upgrade their product or

make a complaint. In the language of statistics, these are time-dependent

AM

covariates, meaning that the additional factors can occur at any point during

the customer™s tenure, not only at the beginning of the relationship. Such

factors might be a customer™s response to a retention campaign or making

TE

complaints. Since Cox™s original work, he and other statisticians have

extended this technique to include these types of factors.

Survival Analysis in Practice

Survival analysis has proven to be very valuable for understanding customers

and quantifying marketing efforts in terms of customer retention. It provides a

way of estimating how long it will be until something occurs. This section

gives some particular examples of survival analysis.

Handling Different Types of Attrition

Businesses that deal with customers have to deal with customers leaving for a

variety of reasons. Earlier, this chapter described hazard probabilities and

explained how hazards illustrate aspects of the business that affect the customer

life cycle. In particular, peaks in hazards coincided with business processes that

forced out customers who were not paying their bills.

Since these customers are treated differently, it is tempting to remove them

entirely from the hazard calculation. This is the wrong approach. The problem

is, which customers to remove is only known after the customers have been

Team-Fly®

Hazard Functions and Survival Analysis in Marketing 413

forced to stop. As mentioned earlier, it is not a good idea to use such knowl

edge, gained at the end of the customer relationship, to filter customers for

analysis.

The right approach is to break this into two problems. What are the hazards

for voluntary attrition? What are the hazards for forced attrition? Each of these

uses all the customers, censoring the customers who leave due to other factors.

When calculating the hazards for voluntary attrition, whenever a customer is

forced to leave, the customer is included in the analysis until he or she leaves”

at that point, the customer is censored. This makes sense. Up to the point when

the customer was forced to leave, the customer did not leave voluntarily.

This approach can be extended for other purposes. Once upon a time, the

authors were trying to understand different groups of customers at a news

paper, in particular, how survival by acquisition channel was or was not

changing over time. Unfortunately, during one of the time periods, there was

a boycott of the newspaper, raising the overall stop levels during that period.

Not surprisingly, the hazards went up and survival decreased during this time

period.

Is there a way to take into account these particular stops? The answer is

“yes,” because the company did a pretty good job of recording the reasons

why customers stopped. The customers who boycotted the paper were simply

censored on the day they stopped”as they say in the medical world, these

customers were lost to follow-up. By censoring, it was possible to get an accu

rate estimate of the overall hazards without the boycott.

When Will a Customer Come Back?

So far, the discussion of survival analysis has focused on the end of the customer

relationship. Survival analysis can be used for many things besides predicting

the probability of bad things happening. For instance, survival analysis can be

used to estimate when customers will return after having stopped.

Figure 12.12 shows a survival curve and hazards for reactivation of cus

tomers after they deactivate their mobile telephone service. In this case, the

hazard is the probability that a customer returns a given number of days after

the deactivation.

There are several interesting features in these curves. First, the initial reacti

vation rate is very high. In the first week, more than a third of customers reac

tivate. Business rules explain this phenomenon. Many deactivations are due to

customers not paying their bills. Many of these customers are just holding out

until the last minute”they actually intend to keep their phones; they just

don™t like paying the bill. However, once the phone stops working, they

quickly pay up.

414 Chapter 12

100% 10%

90% 9%