. 33
( 137 .)


The challenger offer, in the above scenario, is being sent to a random subset of
customers. Based on the response in this subset, what is the expected response
for this offer for the entire population?
For instance, let™s assume that 50,000 people in the original population would
have responded to the challenger offer if they had received it. Then about 5,000
would be expected to respond in the 10 percent of the population that received
140 Chapter 5

the challenger offer. If exactly this number did respond, then the sample
response rate and the population response rate would both be 5.0 percent. How­
ever, it is possible (though highly, highly unlikely) that all 50,000 responders are
in the sample that receives the challenger offer; this would yield a response rate
of 50 percent. On the other hand it is also possible (and also highly, highly
unlikely) that none of the 50,000 are in the sample chosen, for a response rate of
0 percent. In any sample of one-tenth the population, the observed response rate
might be as low as 0 percent or as high as 50 percent. These are the extreme val­
ues, of course; the actual value is much more likely to be close to 5 percent.
So far, the example has shown that there are many different samples that can
be pulled from the population. Now, let™s flip the situation and say that we
have observed 5,000 responders in the sample. What does this tell us about the
entire population? Once again, it is possible that these are all the responders in
the population, so the low-end estimate is 0.5 percent. On the other hand, it is
possible that everyone else was as responder and we were very, very unlucky
in choosing the sample. The high end would then be 90.5 percent.
That is, there is a 100 percent confidence that the actual response rate on the
population is between 0.5 percent and 90.5 percent. Having a high confidence
is good; however, the range is too broad to be useful. We are willing to settle
for a lower confidence level. Often, 95 or 99 percent confidence is quite suffi­
cient for marketing purposes.
The distribution for the response values follows something called the binomial
distribution. Happily, the binomial distribution is very similar to the normal dis­
tribution whenever we are working with a population larger than a few hundred
people. In Figure 5.8, the jagged line is the binomial distribution and the smooth
line is the corresponding normal distribution; they are practically identical.
The challenge is to determine the corresponding normal distribution given
that a sample of size 100,000 had a response rate of 5 percent. As mentioned
earlier, the normal distribution has two parameters, the mean and standard
deviation. The mean is the observed average (5 percent) in the sample. To
calculate the standard deviation, we need a formula, and statisticians have
figured out the relationship between the standard deviation (strictly speaking,
this is the standard error but the two are equivalent for our purposes) and the
mean value and the sample size for a proportion. This is called the standard
error of a proportion (SEP) and has the formula:
p ) (1 - p)
In this formula, p is the average value and N is the size of the population. So,
the corresponding normal distribution has a standard deviation equal to the
square root of the product of the observed response times one minus the
observed response divided by the total number of samples.
We have already observed that about 68 percent of data following a normal
distribution lies within one standard deviation. For the sample size of 100,000, the
The Lure of Statistics: Data Mining Using Familiar Tools 141

formula is SQRT(5% * 95% / 100,000) is about 0.07 percent. So, we are 68 percent
confident that the actual response is between 4.93 percent and 5.07 percent. We
have also observed that a bit over 95 percent is within two standard deviations;
so the range of 4.86 percent and 5.14 percent is just over 95 percent confident. So,
if we observe a 5 percent response rate for the challenger offer, then we are over
95 percent confident that the response rate on the whole population would have
been between 4.86 percent and 5.14 percent. Note that this conclusion depends
very much on the fact that people who got the challenger offer were selected ran­
domly from the entire population.

Comparing Results Using Confidence Bounds
The previous section discussed confidence intervals as applied to the response
rate of one group who received the challenger offer. In this case, there are actu­
ally two response rates, one for the champion and one for the challenger. Are
these response rates different? Notice that the observed rates could be differ­
ent (say 5.0 percent and 5.001 percent), but these could be indistinguishable
from each other. One way to answer the question is to look at the confidence
interval for each response rate and see whether they overlap. If the intervals
do not overlap, then the response rates are different.
This example investigates a range of response rates from 4.5 percent to 5.5
percent for the champion model. In practice, a single response rate would be
known. However, investigating a range makes it possible to understand what
happens as the rate varies from much lower (4.5 percent) to the same (5.0 per­
cent) to much larger (5.5 percent).
The 95 percent confidence is 1.96 standard deviation from the mean, so the
lower value is the mean minus this number of standard deviations and the
upper is the mean plus this value. Table 5.2 shows the lower and upper bounds
for a range of response rates for the champion model going from 4.5 percent to
5.5 percent.

Probability Density

0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10%

Observed Response Rate

Figure 5.8 Statistics has proven that actual response rate on a population is very close to
a normal distribution whose mean is the measured response on a sample and whose
standard deviation is the standard error of proportion (SEP).

The 95 Percent Confidence Interval Bounds for the Champion Group
Table 5.2
4.5% 900,000 0.0219% 1.96 0.0219%*1.96=0.0429% 4.46% 4.54%
Chapter 5

4.6% 900,000 0.0221% 1.96 0.0221%*1.96=0.0433% 4.56% 4.64%
4.7% 900,000 0.0223% 1.96 0.0223%*1.96=0.0437% 4.66% 4.74%
4.8% 900,000 0.0225% 1.96 0.0225%*1.96=0.0441% 4.76% 4.84%
4.9% 900,000 0.0228% 1.96 0.0228%*1.96=0.0447% 4.86% 4.94%
5.0% 900,000 0.0230% 1.96 0.0230%*1.96=0.0451% 4.95% 5.05%
5.1% 900,000 0.0232% 1.96 0.0232%*1.96=0.0455% 5.05% 5.15%
5.2% 900,000 0.0234% 1.96 0.0234%*1.96=0.0459% 5.15% 5.25%
5.3% 900,000 0.0236% 1.96 0.0236%*1.96=0.0463% 5.25% 5.35%
5.4% 900,000 0.0238% 1.96 0.0238%*1.96=0.0466% 5.35% 5.45%

5.5% 900,000 0.0240% 1.96 0.0240%*1.96=0.0470% 5.45% 5.55%
Response rates vary from 4.5% to 5.5%. The bounds for the 95% confidence level are calculated using1.96 standard deviations from the mean.
The Lure of Statistics: Data Mining Using Familiar Tools 143

Based on these possible response rates, it is possible to tell if the confidence
bounds overlap. The 95 percent confidence bounds for the challenger model
were from about 4.86 percent to 5.14 percent. These bounds overlap the confi­
dence bounds for the champion model when its response rates are 4.9 percent,
5.0 percent, or 5.1 percent. For instance, the confidence interval for a response
rate of 4.9 percent goes from 4.86 percent to 4.94 percent; this does overlap 4.86
percent”5.14 percent. Using the overlapping bounds method, we would con­
sider these statistically the same.

Comparing Results Using Difference of Proportions
Overlapping bounds is easy but its results are a bit pessimistic. That is, even
though the confidence intervals overlap, we might still be quite confident that
the difference is not due to chance with some given level of confidence.
Another approach is to look at the difference between response rates, rather
than the rates themselves. Just as there is a formula for the standard error of a
proportion, there is a formula for the standard error of a difference of propor­
tions (SEDP):
p1 ) (1 - p1)
(1 - p2)
N1 + p2 )
This formula is a lot like the formula for the standard error of a proportion,
except the part in the square root is repeated for each group. Table 5.3 shows
this applied to the champion challenger problem with response rates varying
between 4.5 percent and 5.5 percent for the champion group.
By the difference of proportions, three response rates on the champion have
a confidence under 95 percent (that is, the p-value exceeds 5 percent). If the
challenger response rate is 5.0 percent and the champion is 5.1 percent, then
the difference in response rates might be due to chance. However, if the cham­
pion has a response rate of 5.2 percent, then the likelihood of the difference
being due to chance falls to under 1 percent.

WA R N I N G Confidence intervals only measure the likelihood that sampling
affected the result. There may be many other factors that we need to take into
consideration to determine if two offers are significantly different. Each group
must be selected entirely randomly from the whole population for the
difference of proportions method to work.

The 95 Percent Confidence Interval Bounds for the Difference between the Champion and Challenger groups
Table 5.3


Chapter 5

5.0% 100,000 4.5% 900,000 0.5% 0.07% 6.9 0.0%
5.0% 100,000 4.6% 900,000 0.4% 0.07% 5.5 0.0%
5.0% 100,000 4.7% 900,000 0.3% 0.07% 4.1 0.0%
5.0% 100,000 4.8% 900,000 0.2% 0.07% 2.8 0.6%
5.0% 100,000 4.9% 900,000 0.1% 0.07% 1.4 16.8%
5.0% 100,000 5.0% 900,000 0.0% 0.07% 0.0 100.0%
5.0% 100,000 5.1% 900,000 “0.1% 0.07% “1.4 16.9%
5.0% 100,000 5.2% 900,000 “0.2% 0.07% “2.7 0.6%
5.0% 100,000 5.3% 900,000 “0.3% 0.07% “4.1 0.0%
5.0% 100,000 5.4% 900,000 “0.4% 0.07% “5.5 0.0%
5.0% 100,000 5.5% 900,000 “0.5% 0.07% “6.9 0.0%
The Lure of Statistics: Data Mining Using Familiar Tools 145

Size of Sample
The formulas for the standard error of a proportion and for the standard error
of a difference of proportions both include the sample size. There is an inverse
relationship between the sample size and the size of the confidence interval:
the larger the size of the sample, the narrower the confidence interval. So, if
you want to have more confidence in results, it pays to use larger samples.
Table 5.4 shows the confidence interval for different sizes of the challenger
group, assuming the challenger response rate is observed to be 5 percent. For
very small sizes, the confidence interval is very wide, often too wide to be use­
ful. Earlier, we had said that the normal distribution is an approximation for
the estimate of the actual response rate; with small sample sizes, the estimation
is not a very good one. Statistics has several methods for handling such small
sample sizes. However, these are generally not of much interest to data miners
because our samples are much larger.

Table 5.4 The 95 Percent Confidence Interval for Difference Sizes of the Challenger Group


5.0% 1,000 0.6892% 1.96 3.65% 6.35% 2.70%

5.0% 5,000 0.3082% 1.96 4.40% 5.60% 1.21%

5.0% 10,000 0.2179% 1.96 4.57% 5.43% 0.85%

5.0% 20,000 0.1541% 1.96 4.70% 5.30% 0.60%

5.0% 40,000 0.1090% 1.96 4.79% 5.21% 0.43%

5.0% 60,000 0.0890% 1.96 4.83% 5.17% 0.35%

5.0% 80,000 0.0771% 1.96 4.85% 5.15% 0.30%

5.0% 100,000 0.0689% 1.96 4.86% 5.14% 0.27%

5.0% 120,000 0.0629% 1.96 4.88% 5.12% 0.25%
5.0% 140,000 0.0582% 1.96 4.89% 5.11% 0.23%

5.0% 160,000 0.0545% 1.96 4.89% 5.11% 0.21%

5.0% 180,000 0.0514% 1.96 4.90% 5.10% 0.20%

5.0% 200,000 0.0487% 1.96 4.90% 5.10% 0.19%

5.0% 500,000 0.0308% 1.96 4.94% 5.06% 0.12%

5.0% 1,000,000 0.0218% 1.96 4.96% 5.04% 0.09%
146 Chapter 5

What the Confidence Interval Really Means
The confidence interval is a measure of only one thing, the statistical dispersion
of the result. Assuming that everything else remains the same, it measures the
amount of inaccuracy introduced by the process of sampling. It also assumes
that the sampling process itself is random”that is, that any of the one million
customers could have been offered the challenger offer with an equal likeli­
hood. Random means random. The following are examples of what not to do:
Use customers in California for the challenger and everyone else for the


. 33
( 137 .)