<<

. 80
( 137 .)



>>

patients celebrate their newfound health by visiting Iceland. In a horrible
tragedy, all seven happen to die in an avalanche caused by a submerged
volcano. What is the effectiveness of your treatment on cancer mortality? Just
looking at the data, it is tempting to say there is a 7 percent mortality rate.
However, this mortality is clearly not related to the treatment, so the answer
does not feel right.
And, in fact, the answer is not right. This is an example of competing risks. A
study participant might live, might die of cancer, or might die of a mountain
climbing accident on a distant island. Or the patient might move to Tahiti and
drop out of the study. As medical researchers say, such a patient has been “lost
to follow-up.”
The solution is to censor the patients who exit the study before the event
being studied occurs. If patients drop out of the study, then they were healthy
to the point in time when they dropped out, and the information acquired dur­
ing this period can be used to calculate hazards. Afterward there is no way of
knowing what happened. They are censored at the point when they exit. If a
patient dies of something else, then he or she is censored at the point when
death occurs, and the death is not included in the hazard calculation.

T I P The right way to deal with competing risks is to develop different sets of
hazards for each risk, where the other risks are censored.

Competing risks are familiar in the business environment as well. For
instance, there are often two types of stops: voluntary stops, when a customer
decides to leave, and involuntary stops, when the company decides a cus­
tomer should leave”often due to unpaid bills
In doing an analysis on voluntary churn, what happens to customers who
are forced to discontinue their relationships due to unpaid bills? If such a
customer were forced to stop on day 100, then that customer did not stop vol­
untarily on days 1“99. This information can be used to generate hazards for
voluntary stops. However, starting on day 100, the customer is censored, as
shown in Figure 12.8. Censoring customers, even when they have stopped for
other reasons, makes it possible to understand different types of stops.
404 Chapter 12




These two customers were forced to
leave, so they are censored at the
point of attrition instead of being
considered stopped.

All the data from before they left is
included in the calculation of the
hazard functions for voluntary
attrition ” since this they remained
as customers before then.
time
Figure 12.8 Using censoring makes it possible to develop hazard models for voluntary
attrition that include customers who were forced to leave.




From Hazards to Survival
This chapter started with a discussion of retention curves. From the hazard
functions, it is possible to create a very similar curve, called the survival curve.
The survival curve is more useful and in many senses more accurate.


Retention
A retention curve provides information about how many customers have been
retained for a certain amount of time. One common way of creating a retention
curve is to do the following:
For customers who started 1 week ago, measure the 1-week retention.
––

For customers who started 2 weeks ago, measure the 2-week retention.
––

And so on.
––


Figure 12.9 shows an example of a retention curve based on this approach.
The overall shape of this curve looks appropriate. However, the curve itself is
quite jagged. It seems odd, for instance, that 10-week retention would be bet­
ter than 9-week retention, as suggested by this data.
Hazard Functions and Survival Analysis in Marketing 405

100%
90%

80%
70%
60%
Retention


50%
40%
30%
20%

10%
0%
0 10 20 30 40 50 60 70 80 90 100 110

Tenure (Weeks)
Figure 12.9 A retention curve might be quite jagged.


Actually, it is more than odd, it violates the very idea notion of retention. For
instance, it opens the possibility that the curve will cross the 50 percent thresh­
old more than once, leading to the odd, and inaccurate, conclusion that there
is more than one median lifetime, or that the average retention for customers
during the first 10 weeks after they start might be more than the average for
the first 9 weeks. What is happening? Are customers being reincarnated?
These problems are an artifact of the way the curve was created. Cus­
tomers acquired in any given time period may be better or worse than the
customers acquired in other time periods. For instance, perhaps 9 weeks ago
there was a special pricing offer that brought in bad customers. Customers
who started 10 weeks ago were the usual mix of good and bad, but those who
started 9 weeks ago were particularly bad. So, there are fewer of the bad cus­
tomers after 9 weeks than of the better customers after 10 weeks.
The quality of customers might also vary due merely to random varia­
tion. After all, in the previous figure, there are over 100 time periods being
considered”so, all things being equal, some time periods would be expected
to exhibit differences.
A compounding reason is that marketing efforts change over time, attract­
ing different qualities of customers. For instance, customers arriving by differ­
ent channels often have different retention characteristics, and the mix of
customers from different channels is likely to change over time.


Survival
Hazards give the probability that a customer might stop at a particular point in
time. Survival, on the other hand, gives the probability of a customer surviving
up to that time. Survival values are calculated directly from the hazards.
406 Chapter 12


At any point in time, the chance that a customer survives to the next unit of
time is simply 1 “ hazard, which is called conditional survival at time t (it is
conditional because it assumes that the customers survived up to time t).
Calculating the full survival at a given time requires accumulating all the con­
ditional survivals up to that point in time by multiplying them together. The
survival value starts at 1 (or 100 percent) at time 0, since all customers included
in analysis survive to the beginning of the analysis.
Since the hazard is always between 0 and 1, the conditional survival is also
between 0 and 1. Hence, survival itself is always getting smaller”because
each successive value is being multiplied by a number less than 1. The survival
curve itself starts at 1, gently goes down, sometimes flattening, perhaps, out
but never rising up.
Survival curves make more sense for customer retention purposes than the
retention curves described earlier. Figure 12.10 shows a survival curve and its
corresponding retention curve. It is clear that the survival curve is smoother,
and that it slopes downward at all times. The retention curve bounces all over
the place.
The differences between the retention curve and the survival curve may, at
first, seem nonintuitive. The retention curve is actually pasting together a
whole bunch of different pictures of customers from the past, like a photo col­
lage pieced together from a bunch of different photographs to get a panoramic
image. In the collage, the picture in each photo is quite clear. However, the
boundaries do not necessarily fit together smoothly. Different pictures in the
collage look different, because of differences in lighting or perspective”
differences that contribute to the aesthetic of the collage.


100%
90%

80%
70%
Retention/Survival




60%

50%
40%
30%
20%

10%
0%
0 10 20 30 40 50 60 70 80 90 100 110

Tenure (Weeks)
Figure 12.10 A survival curve is smoother than a retention curve.
Hazard Functions and Survival Analysis in Marketing 407


The same thing is happening with retention curves, where customers who
start at different points in time have different perspectives. Any given point on
the retention curve is close to the actual retention value; however, taken as a
whole, it looks jagged. One way to remove the jaggedness is to focus on cus­
tomers who start at about the same time, as suggested earlier in this chapter.
However, this greatly reduces the amount of data contributing to the curve.

T I P Instead of using retention curves, use survival curves. That is, first
calculate the hazards and then work back to calculate the survival curve.

The survival curve, on the other hand, looks at as many customers as possi­
ble, not just the ones who started exactly n time periods ago. The survival at
any given point in time t uses information from all customers. The hazard at
time t uses information from all customers whose tenure is greater than or
equal to that value (assuming all are in the population at risk). Survival,
though, is calculated by combining all the information for hazards from
smaller values of t.
Because survival calculations use all the data, the values are more stable
than retention calculations. Each point on a retention curve limits customers to
having started at a particular point in time. Also, because a survival curve
always slopes downward, calculations of customer half-life and average cus­
tomer tenure are more accurate. By incorporating more information, survival
provides a more accurate, smoother picture of customer retention.
When analyzing customers, both hazards and survival provide valuable
information about customers. Because survival is cumulative, it gives a good
summary value for comparing different groups of customers: How does the
1-year survival compare among different groups? Survival is also used for
calculating customer half-life and mean customer tenure, which in turn feed
into other calculations, such as customer value.
Because survival is cumulative, it is difficult to see patterns at a particular
point in time. Hazards make the specific causes much more apparent. When
discussing some real-world hazards, it was possible to identify events during
the customer life cycle that were drivers of hazards. Survival curves do not
highlight such events as clearly as hazards do.
The question may also arise about comparing hazards for different groups
of customers. It does not make sense to compare average hazards over a
period of time. Mathematically, “average hazard” does not make sense. The
right approach is to turn the hazards into survival and compare the values on
the survival curves.
The description of hazards and survival presented so far differs a bit from
how the subject is treated in statistics. The sidebar “A Note about Survival
Analysis and Statistics” explains the differences further.
408 Chapter 12


A NOTE ABOUT SURVIVAL ANALYSIS AND STATISTICS
The discussion of survival analysis in this chapter assumes that time is discrete.
In particular, things happen on particular days, and the particular time of day is
not important. This is not only reasonable for the problems addressed by data
mining, but it is also more intuitive and simplifies the mathematics.
In statistics, though, survival analysis makes the opposite assumption, that
time is continuous. Instead of hazard probabilities, statisticians work with
hazard rates, which are turned into survival curves by using exponentiation and
integration. One difference between a rate and a probability is that the rate can
exceed 1, whereas a probability never does. Also, a rate seems less intuitive for
many survival problems encountered with customers.
The method for calculating hazards in this chapter is called the life table
method, and it works well with discrete time data. A very similar method, called
Kaplan-Meier, is used for continuous time data. The two techniques produce
almost exactly the same results when events occur at discrete times.
An important part of statistical survival analysis is the estimation of hazards
using parameterized regression”trying to find the best functional form for the
hazards. This is an alternative approach, calculating the hazards directly from
the data.
The parameterized approach has the important advantage that it can more
easily include covariates in the process. Later in this chapter, there is an
example based on such a parameterized model. Unfortunately, the hazard
function rarely follows a form that would be familiar to nonstatisticians. The
hazards do such a good job of describing the customer life cycle that it would
be shocking if a simple function captured that rich complexity.
We strongly encourage interested readers who have a mathematical or
statistical background to investigate the area further.



Proportional Hazards

Sir David Cox is one of the most cited statisticians of the past century; his work
comprises numerous books and over 250 articles. He has received many
awards including a knighthood bestowed on him by Queen Elizabeth in 1985.
Much of his research centered on understanding hazard functions, and his
work has been particularly important in the world of medical research.
His seminal paper was about determining the effect of initial factors (time­
zero covariates) on hazards. By assuming that these initial factors have a uni­

<<

. 80
( 137 .)



>>