. 24
( 137 .)


the long form, which asks much more detailed questions about income,
occupation, commuting habits, spending patterns, and more. The responses to
these questionnaires provide the basis for demographic profiles.
The Census Bureau strives to keep this information up to date between each
decennial census. The Census Bureau does not release information about
individuals. Instead, it aggregates the information by small geographic areas. The
most commonly used is the census tract, consisting of about 4,000 individuals.
Although census tracts do vary in size, they are much more consistent in
population than other geographic units, such as counties and postal codes.
The census does have smaller geographic units, blocks and block groups;
however, in order to protect the privacy of residents, some data is not made
available below the level of census tracts. From these units, it is possible to
aggregate information by county, state, metropolitan statistical area (MSA),
legislative districts, and so on. The following figure shows some census tracts in
the center of Manhattan:

Census Tract 189
Edu College+ 19.2%
Occ Prof+Exec 17.8%
HHI $75K+ 5.0%
HHI $100K+ 2.4%

Census Tract 122
Edu College+ 66.7%
Occ Prof+Exec 45.0%
HHI $75K+ 58.0%
HHI $100K+ 50.2%

Census Tract 129
Edu College+ 44.8%
Occ Prof+Exec 36.5%
HHI $75K+ 14.8%
HHI $100K+ 7.2%
Data Mining Applications 95

One philosophy of marketing is based on the old proverb “birds of a feather
flock together.” That is, people with similar interests and tastes live in similar
areas (whether voluntarily or because of historical patterns of discrimination).
According to this philosophy, it is a good idea to market to people where you
already have customers and in similar areas. Census information can be
valuable, both for understanding where concentrations of customers are
located and for determining the profile of similar areas.

Tract 189 Goal Tract Fitness
Edu College+ 19.2% 61.3% 0.31
Occ Prof+Exec 17.8% 45.5% 0.39
HHI $75K+ 5.0% 22.6% 0.22
HHI $100K+ 2.4% 7.4% 0.32
Overall Advertising Fitness 0.31

Tract 122 Goal Tract Fitness
Edu College+ 66.7% 61.3% 1.00
Occ Prof+Exec 45.0% 45.5% 0.99
HHI $75K+ 58.0% 22.6% 1.00
HHI $100K+ 50.2% 7.4% 1.00
Overall Advertising Fitness 1.00

Tract 129 Goal Tract Fitness
Edu College+ 44.8% 61.3% 0.73
Occ Prof+Exec 36.5% 45.5% 0.80
HHI $75K+ 14.8% 22.6% 0.65
HHI $100K+ 7.2% 7.4% 0.97
Overall Advertising Fitness 0.79

Figure 4.1 Example of calculating readership fitness for three census tracts in Manhattan.

Data Mining to Improve Direct
Marketing Campaigns
Advertising can be used to reach prospects about whom nothing is known as
individuals. Direct marketing requires at least a tiny bit of additional informa­
tion such as a name and address or a phone number or an email address.
Where there is more information, there are also more opportunities for data
mining. At the most basic level, data mining can be used to improve targeting
by selecting which people to contact.
96 Chapter 4

Actually, the first level of targeting does not require data mining, only data.
In the United States, and to a lesser extent in many other countries, there is
quite a bit of data available about a large proportion of the population. In
many countries, there are companies that compile and sell household-level
data on all sorts of things including income, number of children, education
level, and even hobbies. Some of this data is collected from public records.
Home purchases, marriages, births, and deaths are matters of public record
that can be gathered from county courthouses and registries of deeds. Other
data is gathered from product registration forms. Some is imputed using mod­
els. The rules governing the use of this data for marketing purposes vary from
country to country. In some, data can be sold by address, but not by name. In
others data may be used only for certain approved purposes. In some coun­
tries, data may be used with few restrictions, but only a limited number of
households are covered. In the United States, some data, such as medical
records, is completely off limits. Some data, such as credit history, can only be
used for certain approved purposes. Much of the rest is unrestricted.

WA R N I N G The United States is unusual in both the extent of commercially
available household data and the relatively few restrictions on its use. Although
household data is available in many countries, the rules governing its use differ.
There are especially strict rules governing transborder transfers of personal
data. Before planning to use houshold data for marketing, look into its
availability in your market and the legal restrictions on making use of it.

Household-level data can be used directly for a first rough cut at segmenta­
tion based on such things as income, car ownership, or presence of children.
The problem is that even after the obvious filters have been applied, the remain­
ing pool can be very large relative to the number of prospects likely to respond.
Thus, a principal application of data mining to prospects is targeting”finding
the prospects most likely to actually respond to an offer.

Response Modeling
Direct marketing campaigns typically have response rates measured in the
single digits. Response models are used to improve response rates by identify­
ing prospects who are more likely to respond to a direct solicitation. The most
useful response models provide an actual estimate of the likelihood of
response, but this is not a strict requirement. Any model that allows prospects
to be ranked by likelihood of response is sufficient. Given a ranked list, direct
marketers can increase the percentage of responders reached by campaigns by
mailing or calling people near the top of the list.
The following sections describe several ways that model scores can be
used to improve direct marketing. This discussion is independent of the data
Data Mining Applications 97

mining techniques used to generate the scores. It is worth noting, however,
that many of the data mining techniques in this book can and have been
applied to response modeling.
According to the Direct Marketing Association, an industry group, a typical
mailing of 100,000 pieces costs about $100,000 dollars, although the price can
vary considerably depending on the complexity of the mailing. Of that, some
of the costs, such as developing the creative content, preparing the artwork,
and initial setup for printing, are independent of the size of the mailing. The
rest of the cost varies directly with the number of pieces mailed. Mailing lists
of known mail order responders or active magazine subscribers can be pur­
chased on a price per thousand names basis. Mail shop production costs and
postage are charged on a similar basis. The larger the mailing, the less impor­
tant the fixed costs become. For ease of calculation, the examples in this book
assume that it costs one dollar to reach one person with a direct mail cam­
paign. This is not an unreasonable estimate, although simple mailings cost less
and very fancy mailings cost more.

Optimizing Response for a Fixed Budget
The simplest way to make use of model scores is to use them to assign ranks.
Once prospects have been ranked by a propensity-to-respond score, the
prospect list can be sorted so that those most likely to respond are at the top of
the list and those least likely to respond are at the bottom. Many modeling
techniques can be used to generate response scores including regression mod­
els, decision trees, and neural networks.
Sorting a list makes sense whenever there is neither time nor budget to
reach all prospects. If some people must be left out, it makes sense to leave out
the ones who are least likely to respond. Not all businesses feel the need to
leave out prospects. A local cable company may consider every household in
its town to be a prospect and it may have the capacity to write or call every one
of those households several times a year. When the marketing plan calls for
making identical offers to every prospect, there is not much need for response
modeling! However, data mining may still be useful for selecting the proper
messages and to predict how prospects are likely to behave as customers.
A more likely scenario is that the marketing budget does not allow the same
level of engagement with every prospect. Consider a company with 1 million
names on its prospect list and $300,000 to spend on a marketing campaign that
has a cost of one dollar per contact. This company, which we call the Simplify­
ing Assumptions Corporation (or SAC for short), can maximize the number of
responses it gets for its $300,000 expenditure by scoring the prospect list with
a response model and sending its offer to the prospects with the top 300,000
scores. The effect of this action is illustrated in Figure 4.2.
98 Chapter 4

Models are used to produce scores. When a cutoff score is used to decide
which customers to include in a campaign, the customers are, in effect, being
classified into two groups”those likely to respond, and those not likely to
respond. One way of evaluating a classification rule is to examine its error
rates. In a binary classification task, the overall misclassification rate has two
components, the false positive rate, and the false negative rate. Changing the
cutoff score changes the proportion of the two types of error. For a response
model where a higher score indicates a higher liklihood to respond, choosing a
high score as the cutoff means fewer false positive (people labled as
responders who do not respond) and more false negatives (people labled as
nonresponders who would respond).
An ROC curve is used to represent the relationship of the false-positive rate
to the false-negative rate of a test as the cutoff score varies. The letters ROC
stand for “Receiver Operating Characteristics” a name that goes back to the
curve™s origins in World War II when it was developed to assess the ability of
radar operators to identify correctly a blip on the radar screen , whether the
blip was an enemy ship or something harmless. Today, ROC curves are more
likely to used by medical researchers to evaluate medical tests. The false
positive rate is plotted on the X-axis and one minus the false negative rate is
plotted on the Y-axis. The ROC curve in the following figure

ROC Chart











0 20 40 60 80 100
Data Mining Applications 99

ROC CURVES (continued)

Reflects a test with the error profile represented by the following table:

FN 0 2 4 8 12 22 32 46 60 80 100

FP 100 72 44 30 16 11 6 4 2 1 0

Choosing a cutoff for the model score such that there are very few false
positives, leads to a high rate of false negatives and vice versa. A good model
(or medical test) has some scores that are good at discriminating between
outcomes, thereby reducing both kinds of error. When this is true, the ROC
curve bulges towards the upper-left corner. The area under the ROC curve is a
measure of the model™s ability to differentiate between two outcomes. This
measure is called discrimination. A perfect test has discrimination of 1 and a
useless test for two outcomes has discrimination 0.5 since that is the area
under the diagonal line that represents no model.
ROC curves tend to be less useful for marketing applications than in some
other domains. One reason is that the false positive rates are so high and the
false negative rates so low that even a large change in the cutoff score does not
change the shape of the curve much.



. 24
( 137 .)