("Risk" of Reactivating)

80% 8%

Hazard Probability

(Remain Deactivated)

70% 7%

60% 6%

Survival

5%

50%

40% 4%

30% 3%

20% 2%

10% 1%

0% 0%

0 30 60 90 120 150 180 210 240 270 300 330 360

Days after Deactivation

Figure 12.12 Survival curve (upper curve) and hazards (lower curve) for reactivation of

mobile telephone customers.

After 90 days, the hazards are practically zero”customers do not reactivate.

Once again, the business processes provide guidance. Telephone numbers are

reserved for 90 days after customers leave. Normally, when customers reacti

vate, they want to keep the same telephone number. After 90 days, the number

may have been reassigned, and the customer would have to get a new tele

phone number.

This discussion has glossed over the question of how new (reactivated)

customers were associated with the expired accounts. In this case, the analysis

used the telephone numbers in conjunction with an account ID. This pretty

much guaranteed that the match was accurate, since reactivated customers

retained their telephone numbers and billing information. This is very

conservative but works for finding reactivations. It does not work for finding

other types of winback, such as customers who are willing to cycle through

telephone numbers in order to get introductory discounts.

Another approach is to try to identify individuals over time, even when they

are on different accounts. For businesses that collect Social Security numbers or

driver™s license numbers as a regular part of their business, such identifying

numbers can connect accounts together over time. (Be aware that not everyone

who is asked to supply this kind of identifying information does so accurately.)

Sometimes matching names, addresses, telephone numbers, and/or credit

cards is sufficient for matching purposes. More often, this task is outsourced to

a company that assigns individual and household IDs, which then provide

the information needed to identify which new customers are really former cus

tomers who have been won back.

Studying initial covariates adds even more information. In this case,

“initial” means whatever is known about the customer at the point of deactiva

tion. This includes not only information such as initial product and promotion,

Hazard Functions and Survival Analysis in Marketing 415

but also customer behavior before deactivating. Are customers who complain a

lot more or less likely to reactivate? Customers who roam? Customers who pay

their bills late?

This example shows the use of hazards to understand a classic time-to-event

question. There are other questions of this genre amenable to survival analysis:

When customers start on a minimum pricing plan, how long will it be

––

before they upgrade to a premium plan?

When customers upgrade to a premium plan, how long will it be before

––

they downgrade?

What is the expected length of time between purchases for customers,

––

given past customer behavior and the fact that different customers have

different purchase periods?

One nice aspect of using survival analysis is that it is easy to ask about the

effects of different initial conditions”such as the number of times that a cus

tomer has visited in the past. Using proportional hazards, it is possible to

determine which covariates have the most effect on the desired outcome,

including which interventions are most and least likely to work.

Forecasting

Another interesting application of survival analysis is forecasting the number

of customers into the future, or equivalently, the number of stops on a given

day in the future. In the aggregate, survival does a good job of estimating how

many customers will stick around for a given length of time.

There are two components to any such forecast. The first is a model of exist

ing customers, which can take into account various covariates during the cus-

tomer™s life cycle. Such a model works by applying one or more survival

models to all customers. If a customer has survived for 100 days, then the

probability of stopping tomorrow is the hazard at day 100. To calculate the

chance of stopping the day after tomorrow, first assume that the customer

does not stop tomorrow and then does stop on day 101. This is the conditional

survival (one minus the hazard”the probability of not stopping) at day 100

times the hazard for day 101. Applying this to all customer tenures, it is possi

ble to forecast stops of existing customers in the future.

Figure 12.13 shows such a forecast for stops for 1 month, developed by sur

vival expert Will Potts. Also shown are the actual values observed during this

period. The survival-based forecast proves to be quite close to what is actually

happening. By the way, this particular survival estimate used a parametric

model on the hazards rather than empirical hazard rates; the model was able

to take into account the day of the week. This results in the weekly cycle of

stops evident in the graph.

416 Chapter 12

Number Actual Predicted

180

170

160

150

140

130

120

110

100

90

80

70

60

50

40

30

20

10

0

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

Day of Month

Figure 12.13 Survival analysis can also be used for forecasting customer stops.

The second component of a customer-level forecast is a bit more difficult to

calculate. This component is the effect of new customers on the forecast, and

the difficulty is not technical. The challenge is getting estimates for new starts.

Fortunately, there are often budget forecasts that contain new starts, some

times broken down by product, channel, or geography. It is possible to refine

the survival models to take into account these effects. Of course, the forecast is

only as accurate as the budget. The upside, though, is that the forecast, based

on survival techniques, can be incorporated into the process of managing

actual levels against budgeted levels.

The combination of these components”stop forecasts for existing cus

tomers and stop forecasts for new customers”makes it possible to develop

estimates of customer levels into the future. The authors have worked with

clients who have taken these forecasts forward years. Because the models for

new customers included the acquisition channel, the forecasting model made

it possible to optimize the future acquisition channel mix.

Hazards Changing over Time

One of the more difficult issues in survival analysis is whether the hazards

themselves are constant or whether they change over time. The assumption in

scientific studies is that hazards do not change. The goal of scientific survival

analysis is to obtain estimates of the “real” hazard in various situations.

Hazard Functions and Survival Analysis in Marketing 417

This assumption may or may not be true in marketing. Certainly, working

with this assumption, survival analysis has proven its worth with customer

data. However, it is interesting to consider the possibility that hazards may

be changing over time. In particular, if hazards do change, that gives some

insight into whether the market place and customers are getting better or

worse over time.

One approach to answering this question is to base hazards on customers

who are stopping rather than customers who are starting, especially, say, cus

tomers who have stopped in each of the past few years. In other words, were

the hazards associated with customers who stopped last year significantly

different from the hazards associated with customers who stopped the previ

ous year? Earlier, this chapter warned that calculating hazards for a set of cus

tomers chosen by their stop date does not produce accurate hazards. How can

we overcome this problem?

There is a way to calculate these hazards, although this has not yet appeared

in standard statistical tools. This method uses time windows on the customers

to estimate the hazard probability. Remember the definition of the empirical

hazard probability: the number of customers who stopped at a particular time

divided by the number of customers who could have stopped at that time. Up

to now, all customers have been included in the calculation. The idea is to

restrict the customers only to those who could have stopped during the period

in question.

As an example, consider estimating the hazards based on customers who

stopped in 2003. Customers who stopped in 2003 were either active on the first

day of 2003 or were new customers during the year. In either case, customers

only contribute to the population count starting at whatever their tenure was

on the first day of 2003 (or 0 for new starts).

Let™s consider the calculation of the 1-day hazard probability. What is the

population of customers who could have stopped with 1 day of tenure and

also have the stop in 2003? Only customers that started between December 31,

2002 and December 30, 2003 could have a 1-day stop in 2003. So, the calcula

tion of the 1-day hazard uses all stops in 2003 where the tenure was 1 day as

the total for stops. The population at risk consists of customers who started

between December 31, 2002 and December 30, 2003. As another example, the

365-day hazard would be based on a population count of customers who

started in 2002.

The result is an estimate of the hazards based on stops during a particular

period of time. For comparison purposes, survival proves to be more useful

than the hazards themselves. Figure 12.14 provides an example, showing that

survival is indeed decreasing over the course of several years. The changes in

survival are small. However, the calculations are based on hundreds of thou

sands of customers and do represent a decline in customer quality.

418 Chapter 12

100%

90%

80%

70%

60%

Survival

50%

40%

30%

20%

10%

0%

2000

0 30 60 90 120 150 180 210 240 270 300 330 360

2001

Days after Start

2002

Figure 12.14 A time-window technique makes it possible to see changes in survival over

time.

Lessons Learned

Hazards and survival analysis are designed for understanding customers.

This chapter introduced hazards as the conditional probability of a customer

leaving at a given point in time. This treatment of survival analysis is unortho

dox in terms of statistics, which prefers an approach based on continuous rates

rather than discrete time probabilities. However, this treatment is more intu

itive for analyzing customers.

Hazards are like an x-ray of the customer life cycle. The related idea of sur

vival, which is the proportion of customers who survive up to a particular

point in time, makes it possible to compare different groups of customers and

to translate results into dollars and cents. When there are enough customers

(and usually there are), stratifying the customers by building a separate curve

for each group provides a good comparison. It is possible to use other mea

sures, such as the survival at a particular point in time, the customer half-life,

and the average tenure, to better understand customers.

One of the key concepts in survival analysis is censoring. This means that

some customers are dropped from the analysis. The idea of censoring can be

extended to understand competing risks, such as voluntary versus forced

attrition. Censoring also makes it possible to discard certain outcomes, such as

a one-time boycott, without adversely biasing overall results.

Hazard Functions and Survival Analysis in Marketing 419

One of the most powerful aspects of hazards is the ability to determine

which factors, at the onset, are responsible for increasing or decreasing the

hazards. In addition to stratifying customers, there is another technique, Cox

proportional hazards regression, which has proven its worth since the 1970s

and continues to be extended and improved upon.

Survival analysis has many applications beyond measuring the probability

of customers leaving. It has been used for forecasting customer levels, as well

as for predicting other types of events during the customer life cycle. It is a

very powerful tool, seemingly designed specifically for understanding cus

tomers and their life cycles.