ñòð. 33 |

customers. Based on the response in this subset, what is the expected response

for this offer for the entire population?

For instance, letâ€™s assume that 50,000 people in the original population would

have responded to the challenger offer if they had received it. Then about 5,000

would be expected to respond in the 10 percent of the population that received

140 Chapter 5

the challenger offer. If exactly this number did respond, then the sample

response rate and the population response rate would both be 5.0 percent. HowÂ

ever, it is possible (though highly, highly unlikely) that all 50,000 responders are

in the sample that receives the challenger offer; this would yield a response rate

of 50 percent. On the other hand it is also possible (and also highly, highly

unlikely) that none of the 50,000 are in the sample chosen, for a response rate of

0 percent. In any sample of one-tenth the population, the observed response rate

might be as low as 0 percent or as high as 50 percent. These are the extreme valÂ

ues, of course; the actual value is much more likely to be close to 5 percent.

So far, the example has shown that there are many different samples that can

be pulled from the population. Now, letâ€™s flip the situation and say that we

have observed 5,000 responders in the sample. What does this tell us about the

entire population? Once again, it is possible that these are all the responders in

the population, so the low-end estimate is 0.5 percent. On the other hand, it is

possible that everyone else was as responder and we were very, very unlucky

in choosing the sample. The high end would then be 90.5 percent.

That is, there is a 100 percent confidence that the actual response rate on the

population is between 0.5 percent and 90.5 percent. Having a high confidence

is good; however, the range is too broad to be useful. We are willing to settle

for a lower confidence level. Often, 95 or 99 percent confidence is quite suffiÂ

cient for marketing purposes.

The distribution for the response values follows something called the binomial

distribution. Happily, the binomial distribution is very similar to the normal disÂ

tribution whenever we are working with a population larger than a few hundred

people. In Figure 5.8, the jagged line is the binomial distribution and the smooth

line is the corresponding normal distribution; they are practically identical.

The challenge is to determine the corresponding normal distribution given

that a sample of size 100,000 had a response rate of 5 percent. As mentioned

earlier, the normal distribution has two parameters, the mean and standard

deviation. The mean is the observed average (5 percent) in the sample. To

calculate the standard deviation, we need a formula, and statisticians have

figured out the relationship between the standard deviation (strictly speaking,

this is the standard error but the two are equivalent for our purposes) and the

mean value and the sample size for a proportion. This is called the standard

error of a proportion (SEP) and has the formula:

p ) (1 - p)

SEP =

N

In this formula, p is the average value and N is the size of the population. So,

the corresponding normal distribution has a standard deviation equal to the

square root of the product of the observed response times one minus the

observed response divided by the total number of samples.

We have already observed that about 68 percent of data following a normal

distribution lies within one standard deviation. For the sample size of 100,000, the

The Lure of Statistics: Data Mining Using Familiar Tools 141

formula is SQRT(5% * 95% / 100,000) is about 0.07 percent. So, we are 68 percent

confident that the actual response is between 4.93 percent and 5.07 percent. We

have also observed that a bit over 95 percent is within two standard deviations;

so the range of 4.86 percent and 5.14 percent is just over 95 percent confident. So,

if we observe a 5 percent response rate for the challenger offer, then we are over

95 percent confident that the response rate on the whole population would have

been between 4.86 percent and 5.14 percent. Note that this conclusion depends

very much on the fact that people who got the challenger offer were selected ranÂ

domly from the entire population.

Comparing Results Using Confidence Bounds

The previous section discussed confidence intervals as applied to the response

rate of one group who received the challenger offer. In this case, there are actuÂ

ally two response rates, one for the champion and one for the challenger. Are

these response rates different? Notice that the observed rates could be differÂ

ent (say 5.0 percent and 5.001 percent), but these could be indistinguishable

from each other. One way to answer the question is to look at the confidence

interval for each response rate and see whether they overlap. If the intervals

do not overlap, then the response rates are different.

This example investigates a range of response rates from 4.5 percent to 5.5

percent for the champion model. In practice, a single response rate would be

known. However, investigating a range makes it possible to understand what

happens as the rate varies from much lower (4.5 percent) to the same (5.0 perÂ

cent) to much larger (5.5 percent).

The 95 percent confidence is 1.96 standard deviation from the mean, so the

lower value is the mean minus this number of standard deviations and the

upper is the mean plus this value. Table 5.2 shows the lower and upper bounds

for a range of response rates for the champion model going from 4.5 percent to

5.5 percent.

6%

Probability Density

5%

4%

3%

2%

1%

0%

0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10%

Observed Response Rate

Figure 5.8 Statistics has proven that actual response rate on a population is very close to

a normal distribution whose mean is the measured response on a sample and whose

standard deviation is the standard error of proportion (SEP).

142

The 95 Percent Confidence Interval Bounds for the Champion Group

Table 5.2

RESPONSE SIZE SEP 95% CONF 95% CONF * SEP LOWER UPPER

4.5% 900,000 0.0219% 1.96 0.0219%*1.96=0.0429% 4.46% 4.54%

Chapter 5

4.6% 900,000 0.0221% 1.96 0.0221%*1.96=0.0433% 4.56% 4.64%

4.7% 900,000 0.0223% 1.96 0.0223%*1.96=0.0437% 4.66% 4.74%

4.8% 900,000 0.0225% 1.96 0.0225%*1.96=0.0441% 4.76% 4.84%

4.9% 900,000 0.0228% 1.96 0.0228%*1.96=0.0447% 4.86% 4.94%

5.0% 900,000 0.0230% 1.96 0.0230%*1.96=0.0451% 4.95% 5.05%

5.1% 900,000 0.0232% 1.96 0.0232%*1.96=0.0455% 5.05% 5.15%

TE

5.2% 900,000 0.0234% 1.96 0.0234%*1.96=0.0459% 5.15% 5.25%

5.3% 900,000 0.0236% 1.96 0.0236%*1.96=0.0463% 5.25% 5.35%

AM

5.4% 900,000 0.0238% 1.96 0.0238%*1.96=0.0466% 5.35% 5.45%

Team-FlyÂ®

5.5% 900,000 0.0240% 1.96 0.0240%*1.96=0.0470% 5.45% 5.55%

Response rates vary from 4.5% to 5.5%. The bounds for the 95% confidence level are calculated using1.96 standard deviations from the mean.

FL

Y

The Lure of Statistics: Data Mining Using Familiar Tools 143

Based on these possible response rates, it is possible to tell if the confidence

bounds overlap. The 95 percent confidence bounds for the challenger model

were from about 4.86 percent to 5.14 percent. These bounds overlap the confiÂ

dence bounds for the champion model when its response rates are 4.9 percent,

5.0 percent, or 5.1 percent. For instance, the confidence interval for a response

rate of 4.9 percent goes from 4.86 percent to 4.94 percent; this does overlap 4.86

percentâ€”5.14 percent. Using the overlapping bounds method, we would conÂ

sider these statistically the same.

Comparing Results Using Difference of Proportions

Overlapping bounds is easy but its results are a bit pessimistic. That is, even

though the confidence intervals overlap, we might still be quite confident that

the difference is not due to chance with some given level of confidence.

Another approach is to look at the difference between response rates, rather

than the rates themselves. Just as there is a formula for the standard error of a

proportion, there is a formula for the standard error of a difference of proporÂ

tions (SEDP):

p1 ) (1 - p1)

SEDP =

(1 - p2)

N1 + p2 )

N2

This formula is a lot like the formula for the standard error of a proportion,

except the part in the square root is repeated for each group. Table 5.3 shows

this applied to the champion challenger problem with response rates varying

between 4.5 percent and 5.5 percent for the champion group.

By the difference of proportions, three response rates on the champion have

a confidence under 95 percent (that is, the p-value exceeds 5 percent). If the

challenger response rate is 5.0 percent and the champion is 5.1 percent, then

the difference in response rates might be due to chance. However, if the chamÂ

pion has a response rate of 5.2 percent, then the likelihood of the difference

being due to chance falls to under 1 percent.

WA R N I N G Confidence intervals only measure the likelihood that sampling

affected the result. There may be many other factors that we need to take into

consideration to determine if two offers are significantly different. Each group

must be selected entirely randomly from the whole population for the

difference of proportions method to work.

144

The 95 Percent Confidence Interval Bounds for the Difference between the Champion and Challenger groups

Table 5.3

CHALLENGER CHAMPION DIFFERENCE

RESPONSE SIZE RESPONSE SIZE VALUE SEDP Z-VALUE P-VALUE

Chapter 5

5.0% 100,000 4.5% 900,000 0.5% 0.07% 6.9 0.0%

5.0% 100,000 4.6% 900,000 0.4% 0.07% 5.5 0.0%

5.0% 100,000 4.7% 900,000 0.3% 0.07% 4.1 0.0%

5.0% 100,000 4.8% 900,000 0.2% 0.07% 2.8 0.6%

5.0% 100,000 4.9% 900,000 0.1% 0.07% 1.4 16.8%

5.0% 100,000 5.0% 900,000 0.0% 0.07% 0.0 100.0%

5.0% 100,000 5.1% 900,000 â€“0.1% 0.07% â€“1.4 16.9%

5.0% 100,000 5.2% 900,000 â€“0.2% 0.07% â€“2.7 0.6%

5.0% 100,000 5.3% 900,000 â€“0.3% 0.07% â€“4.1 0.0%

5.0% 100,000 5.4% 900,000 â€“0.4% 0.07% â€“5.5 0.0%

5.0% 100,000 5.5% 900,000 â€“0.5% 0.07% â€“6.9 0.0%

The Lure of Statistics: Data Mining Using Familiar Tools 145

Size of Sample

The formulas for the standard error of a proportion and for the standard error

of a difference of proportions both include the sample size. There is an inverse

relationship between the sample size and the size of the confidence interval:

the larger the size of the sample, the narrower the confidence interval. So, if

you want to have more confidence in results, it pays to use larger samples.

Table 5.4 shows the confidence interval for different sizes of the challenger

group, assuming the challenger response rate is observed to be 5 percent. For

very small sizes, the confidence interval is very wide, often too wide to be useÂ

ful. Earlier, we had said that the normal distribution is an approximation for

the estimate of the actual response rate; with small sample sizes, the estimation

is not a very good one. Statistics has several methods for handling such small

sample sizes. However, these are generally not of much interest to data miners

because our samples are much larger.

Table 5.4 The 95 Percent Confidence Interval for Difference Sizes of the Challenger Group

RESPONSE SIZE SEP 95% CONF LOWER HIGH WIDTH

5.0% 1,000 0.6892% 1.96 3.65% 6.35% 2.70%

5.0% 5,000 0.3082% 1.96 4.40% 5.60% 1.21%

5.0% 10,000 0.2179% 1.96 4.57% 5.43% 0.85%

5.0% 20,000 0.1541% 1.96 4.70% 5.30% 0.60%

5.0% 40,000 0.1090% 1.96 4.79% 5.21% 0.43%

5.0% 60,000 0.0890% 1.96 4.83% 5.17% 0.35%

5.0% 80,000 0.0771% 1.96 4.85% 5.15% 0.30%

5.0% 100,000 0.0689% 1.96 4.86% 5.14% 0.27%

5.0% 120,000 0.0629% 1.96 4.88% 5.12% 0.25%

5.0% 140,000 0.0582% 1.96 4.89% 5.11% 0.23%

5.0% 160,000 0.0545% 1.96 4.89% 5.11% 0.21%

5.0% 180,000 0.0514% 1.96 4.90% 5.10% 0.20%

5.0% 200,000 0.0487% 1.96 4.90% 5.10% 0.19%

5.0% 500,000 0.0308% 1.96 4.94% 5.06% 0.12%

5.0% 1,000,000 0.0218% 1.96 4.96% 5.04% 0.09%

146 Chapter 5

What the Confidence Interval Really Means

The confidence interval is a measure of only one thing, the statistical dispersion

of the result. Assuming that everything else remains the same, it measures the

amount of inaccuracy introduced by the process of sampling. It also assumes

that the sampling process itself is randomâ€”that is, that any of the one million

customers could have been offered the challenger offer with an equal likeliÂ

hood. Random means random. The following are examples of what not to do:

Use customers in California for the challenger and everyone else for the

ñòð. 33 |