Concentration (% of Responders)

80%

70%

60%

50%

Benefit

40%

30%

20%

Response Model

10%

No Model

0%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

List Penetration (% of Prospects)

Figure 4.2 A cumulative gains or concentration chart shows the benefit of using a model.

100 Chapter 4

The upper, curved line plots the concentration, the percentage of all respon

ders captured as more and more of the prospects are included in the campaign.

The straight diagonal line is there for comparison. It represents what happens

with no model so the concentration does not vary as a function of penetration.

Mailing to 30 percent of the prospects chosen at random would find 30 percent

of the responders. With the model, mailing to the top 30 percent of prospects

finds 65 percent of the responders. The ratio of concentration to penetration is

the lift. The difference between these two lines is the benefit. Lift was discussed

in the previous chapter. Benefit is discussed in a sidebar.

The model pictured here has lift of 2.17 at the third decile, meaning that

using the model, SAC will get twice as many responders for its expenditure of

$300,000 than it would have received by mailing to 30 percent of its one million

prospects at random.

Optimizing Campaign Profitability

There is no doubt that doubling the response rate to a campaign is a desirable

outcome, but how much is it actually worth? Is the campaign even profitable?

Although lift is a useful way of comparing models, it does not answer these

important questions. To address profitability, more information is needed. In

particular, calculating profitability requires information on revenues as well as

costs. Let™s add a few more details to the SAC example.

The Simplifying Assumptions Corporation sells a single product for a

single price. The price of the product is $100. The total cost to SAC to manu

facture, warehouse and distribute the product is $55 dollars. As already

mentioned, it costs one dollar to reach a prospect. There is now enough

information to calculate the value of a response. The gross value of each

response is $100. The net value of each response takes into account the costs

associated with the response ($55 for the cost of goods and $1 for the contact)

to achieve net revenue of $44 per response. This information is summarized in

Table 4.3.

Table 4.3 Profit/Loss Matrix for the Simplifying Assumptions Corporation

MAILED RESPONDED

Yes No

Yes $44 $“1

No $0 $0

Data Mining Applications 101

BENEFIT

Concentration charts, such as the one pictured in Figure 4.2, are usually

discussed in terms of lift. Lift measures the relationship of concentration to

penetration and is certainly a useful way of comparing the performance of two

models at a given depth in the prospect list. However, it fails to capture another

concept that seems intuitively important when looking at the chart”namely,

how far apart are the lines, and at what penetration are they farthest apart?

Our colleague, the statistician Will Potts, gives the name benefit to the

difference between concentration and penetration. Using his nomenclature, the

point where this difference is maximized is the point of maximum benefit. Note

that the point of maximum benefit does not correspond to the point of highest

lift. Lift is always maximized at the left edge of the concentration chart where

the concentration is highest and the slope of the curve is steepest.

The point of maximum benefit is a bit more interesting. To explain some of

its useful properties this sidebar makes reference to some things (such ROC

curves and KS tests) that are not explained in the main body of the book. Each

bulleted point is a formal statement about the maximum benefit point on the

concentration curve. The formal statements are followed by informal

explanations.

— The maximum benefit is proportional to the maximum distance between

the cumulative distribution functions of the probabilities in each class.

What this means is that the model score that cuts the prospect list at the

penetration where the benefit is greatest is also the score that maximizes

the Kolmogorov-Smirnov (KS) statistic. The KS test is popular among some

statisticians, especially in the financial services industry. It was developed

as a test of whether two distributions are different. Splitting the list at the

point of maximum benefit results in a “good list” and a “bad list” whose

distributions of responders are maximally separate from each other and

from the population. In this case, the “good list” has a maximum propor

tion of responders and the “bad list” has a minimum proportion.

— The maximum benefit point on the concentration curve corresponds to

the maximum perpendicular distance between the corresponding ROC

curve and the no-model line.

The ROC curve resembles the more familiar concentration or cumulative

gains chart, so it is not surprising that there is a relationship between them. As

explained in another sidebar, the ROC curve shows the trade-off between two

types of misclassification error. The maximum benefit point on the cumulative

gains chart corresponds to a point on the ROC curve where the separation

between the classes is maximized.

— The maximum benefit point corresponds to the decision rule that maxi

mizes the unweighted average of sensitivity and specificity.

(continued)

102 Chapter 4

BENEFIT (continued)

As used in the medical world, sensitivity is the proportion of true posi

tives among people who get a positive result on a test. In other words, it

is the true positives divided by the sum of the true positives and false

positives. Sensitivity measures the likelihood that a diagnosis based on

the test is correct. Specificity is the proportion of true negatives among

people who get a negative result on the test. A good test should be both

sensitive and specific. The maximum benefit point is the cutoff that max

imizes the average of these two measures. In Chapter 8, these concepts

go by the names recall and precision, the terminology used in informa

tion retrieval. Recall measures the number of articles on the correct topic

returned by a Web search or other text query. Precision measures the

Y

percentage of the returned articles that are on the correct topic.

— The maximum benefit point corresponds to a decision rule that mini

FL

mizes the expected loss assuming the misclassification costs are in

versely proportional to the prevalence of the target classes.

AM

One way of evaluating classification rules is to assign a cost to each type

of misclassification and compare rules based on that cost. Whether they

represent responders, defaulters, fraudsters, or people with a particular

disease, the rare cases are generally the most interesting so missing one of

TE

them is more costly than misclassifying one of the common cases. Under

that assumption, the maximum benefit picks a good classification rule.

This table says that if a prospect is contacted and responds, the company

makes forty-four dollars. If a prospect is contacted, but fails to respond, the

company loses $1. In this simplified example, there is neither cost nor benefit

in choosing not to contact a prospect. A more sophisticated analysis might take

into account the fact that there is an opportunity cost to not contacting a

prospect who would have responded, that even a nonresponder may become

a better prospect as a result of the contact through increased brand awareness,

and that responders may have a higher lifetime value than indicated by the

single purchase. Apart from those complications, this simple profit and loss

matrix can be used to translate the response to a campaign into a profit figure.

Ignoring campaign overhead fixed costs, if one prospect responds for every 44

who fail to respond, the campaign breaks even. If the response rate is better

than that, the campaign is profitable.

WA R N I N G If the cost of a failed contact is set too low, the profit and loss

matrix suggests contacting everyone. This may not be a good idea for other

reasons. It could lead to prospects being bombarded with innapropriate offers.

Team-Fly®

Data Mining Applications 103

How the Model Affects Profitability

How does the model whose lift and benefit are characterized by Figure 4.2

affect the profitability of a campaign? The answer depends on the start-up cost

for the campaign, the underlying prevalence of responders in the population

and on the cutoff penetration of people contacted. Recall that SAC had a bud

get of $300,000. Assume that the underlying prevalence of responders in the

population is 1 percent. The budget is enough to contact 300,000 prospects, or

30 percent of the prospect pool. At a depth of 30 percent, the model provides lift

of about 2, so SAC can expect twice as many responders as they would have

without the model. In this case, twice as many means 2 percent instead of 1 per

cent, yielding 6,000 (2% * 300,000) responders each of whom is worth $44 in net

revenue. Under these assumptions, SAC grosses $600,000 and nets $264,000

from responders. Meanwhile, 98 percent of prospects or 294,000 do not

respond. Each of these costs a dollar, so SAC loses $30,000 on the campaign.

Table 4.4 shows the data used to generate the concentration chart in Figure

4.2. It suggests that the campaign could be made profitable by spending less

money to contact fewer prospects while getting a better response rate. Mailing

to only 10,000 prospects, or the top 10 percent of the prospect list, achieves a

lift of 3. This turns the underlying response rate of 1 percent into a response

rate of 3 percent. In this scenario, 3,000 people respond yielding revenue of

$132,000. There are now 97,000 people who fail to respond and each of them

costs one dollar. The resulting profit is $35,000. Better still, SAC has $200,000

left in the marketing budget to use on another campaign or to improve the

offer made in this one, perhaps increasing response still more.

Table 4.4 Lift and Cumulative Gains by Decile

CUMULATIVE

PENETRATION GAINS GAINS LIFT

0% 0% 0% 0

10% 30% 30% 3.000

20% 20% 50% 2.500

30% 15% 65% 2.167

40% 13% 78% 1.950

50% 7% 85% 1.700

60% 5% 90% 1.500

70% 4% 94% 1.343

80% 4% 96% 1.225

90% 2% 100% 1.111

100% 0% 100% 1.000

104 Chapter 4

A smaller, better-targeted campaign can be more profitable than a larger and

more expensive one. Lift increases as the list gets smaller, so is smaller always

better? The answer is no because the absolute revenue decreases as the num

ber of responders decreases. As an extreme example, assume the model can

generate lift of 100 by finding a group with 100 percent response rate when the

underlying response rate is 1 percent. That sounds fantastic, but if there are

only 10 people in the group, they are still only worth $440. Also, a more realis

tic example would include some up-front fixed costs. Figure 4.3 shows what

happens with the assumption that there is a $20,000 fixed cost for the cam

paign in addition to the cost of $1 per contact, revenue of $44 per response, and

an underlying response rate of 1 percent. The campaign is only profitable for a

small range of file penetrations around 10 percent.

Using the model to optimize the profitability of a campaign seems more

attractive than simply using it to pick whom to include on a mailing or call list

of predetermined size, but the approach is not without pitfalls. For one thing,

the results are dependent on the campaign cost, the response rate, and the rev

enue per responder, none of which are known prior to running the campaign.

In the example, these were known, but in real life, they can only be estimated.

It would only take a small variation in any one of these to turn the campaign

in the example above completely unprofitable or to make it profitable over a

much larger range of deciles.

Profit by Decile

$100,000

$0