. 104
( 137 .)


advertising, rather than direct marketing, is the principal way of attracting
new customers. Applications for data mining in advertising are limited, and,
at this stage in their development, companies are not yet focused on customer
relationship management and customer retention. For the limited direct mar­
keting they do, outsourced modeling is often sufficient.
Wireless communications, cable television, and Internet service providers
all went through periods of exponential growth that have only recently come
to an end as these markets matured (and before them, wired telephones, life
insurance, catalogs, and credit cards went through similar cycles). During the
initial growth phases, understanding customers may not be a worthwhile
investment”an additional cell tower, switch, or whatever may provide better
return. Eventually, though, the business and the customer base grow to a point
where understanding the customers takes on increased importance. In our
experience, it is better for companies to start early along the path of customer
insight, rather than waiting until the need becomes critical.

Outsourcing Ongoing Data Mining
Even when a company has recognized the need for data mining, there is still
the possibility of outsourcing. This is particularly true when the company is
built around customer acquisition. In the United States, credit bureaus and
household data suppliers are happy to provide modeling as a value added ser­
vice with the data they sell. There are also direct marketing companies that
handle everything from mailing lists to fulfillment”the actual delivery of
products to customers. These companies often offer outsourced data mining.
Outsourcing arrangements have financial advantages for companies. The
problem is that customer insight is being outsourced as well. A company that
relies on outsourcing customers analytics runs the risk that customer under­
standing will be lost between the company and the vendor.
For instance,one company used direct mail for a significant proportion of its
customer acquisition and outsourced the direct mail response modeling work
to the mailing list vendors. Over the course of about 2 years, there were several
direct mail managers in the company and the emphasis on this channel
decreased. What no one had realized was that direct mail was driving acquisi­
tion that was being credited to other channels. Direct mail pieces could be
filled in and returned by mail, in which case the new acquisition was credited
to direct mail. However, the pieces also contained the company™s URL and a
free phone number. Many prospects who received the direct mail found it
more convenient to respond by phone or on the Web, often forgetting to pro­
vide the special code identifying them as direct mail prospects. Over time, the
response attributed to direct mail decreased, and consequently the budget for
524 Chapter 16

direct mail decreased as well. Only later, when decreased direct mail led to
decreased responses in other channels, did the company realize that ignor­
ing this echo effect had caused them to make a less-than-optimal business

Insourcing Data Mining
The modeling process creates more then models and scores; it also produces
insights. These insights often come during the process of data exploration and
data preparation that is an important part of the data mining process. For that
reason, we feel that any company with ongoing data mining needs should
develop an in-house data mining group to keep the learning in the company.

Building an Interdisciplinary Data Mining Group
Once the decision has been made to bring customer understanding in-house,
the question is where. In some companies, the data mining group has no per­
manent home. It consists of a group of people seconded from their usual jobs
to come together to perform data mining. By its nature, such an arrangement
seems temporary and often it is the result of some urgent requirement such as
the need to understand a sudden upsurge in customer defaults. While it lasts,
such a group can be very effective, but it is unlikely to last very long because
the members will be recalled to their regular duties as soon as a new task
requires their attention.

Building a Data Mining Group in IT
A possible home is in the systems group, since this group is often responsible
for housing customer data and for running customer-facing operational sys­
tems. Because the data mining group is technical and needs access to data and
powerful software and servers, the IT group seems like a natural location. In
fact, analysis can be seen as an extension of providing databases and access
tools and maintaining such systems.
Being part of IT has the advantage that the data mining group has access to
hardware and data as needed, since the IT group has these technical resources
and access to data. In addition, the IT group is a service organization with
clients in many business units. In fact, the business units that are the “cus­
tomers” for data mining are probably already used to relying on IT for data
and reporting.
On the other hand, IT is sometimes a bit removed from the business prob­
lems that motivate customer analytics. Since very slight misunderstandings of
the business problems can lead to useless results, it is very important that peo­
ple from the business units be very closely involved with any IT-based data
mining projects.
Building the Data Mining Environment 525

Building a Data Mining Group in the Business Units
The alternative to putting the data mining group where the data and comput­
ers are is to put it close to the problems being addressed. That generally means
the marketing group, the customer relationship management group (where
such a thing exists), or the finance group. Sometimes there are several small
data mining groups, one in each of several business units. A group in finance
building credit risk models and collections models, one in marketing building
response models, and one in CRM building cross-sell models and voluntary
churn models.
The advantages and disadvantages of this approach are the inverse of those
for putting data mining in IT. The business units have a great understanding
of their own business problems, but may still have to rely on IT for data and
computing resources. Although either approach can be successful, on balance
we prefer to see data mining centered in the business units.

What to Look for in Data Mining Staff
The best data mining groups are often eclectic mixes of people. Because data
mining has not existed very long as a separately named activity, there are few
people who can claim to be trained data miners. There are data miners who
used to be physicists, data miners who used to be geologists, data miners who
used to be computer scientists, data miners who used to be marketing man­
agers, data miners who used to be linguists, and data miners who are still
This makes lunchtime conversation in a data mining group fairly interest­
ing, but it doesn™t offer much guidance for hiring managers. The things that
make good data miners better than mediocre ones are hard to teach and
impossible to automate: good intuition, a feel for how to coax information out
of data, and a natural curiosity.
No one indivdiual is likely to have all the skills required for completing
a data mining project. Among them, the team members should cover the
Database skills (SQL, if the data is stored in relational databases)

Data transformation and programming skills (SAS, SPSS, S-Plus, PERL,

other programming languages, ETL tools)



Machine learning skills


Industry knowledge in the relevant industry


Data visualization skills


Interviewing and requirements-gathering skills


Presentation, writing, and communication skills

526 Chapter 16

A new data mining group should include someone who has done commer­
cial data mining before”preferably in the same industry. If necessary, this
expertise can be provided by outside consultants.

Data Mining Infrastructure
In companies where data mining is merely an exploratory activity, useful data
mining can be accomplished with little infrastructure. A desktop workstation
with some data mining software and access to the corporate databases is likely
to be sufficient. However, when data mining is central to the business, the data
mining infrastructure must be considerably more robust. In these companies,
updating customer profiles with new model scores either on a regular sched­
ule such as once a month or, in some cases with each new transaction, is part
of the regular production process of the data warehouse. The data mining
infrastructure must provide a bridge between the exploratory world where
models are developed and the production world where models are scored and
marketing campaigns run.
A production-ready data mining environment must be able to support the
The ability to access data from many sources and bring the data

together as customer signatures in a data mining model set.
The ability to score customers using already created models from the

model library on demand.
The ability to manage hundreds of model scores over time.

The ability to manage scores or hundreds of models developed over

The ability to reconstruct a customer signature for any point in a cus-

tomer™s tenure, such as immediately before a purchase or other interest­
ing event.
The ability to track changes in model scores over time.

The ability to publish scores, rules, and other data mining results back

to the data warehouse and to other applications that need them.
The data mining infrastructure is logically (and often physically) split into
two pieces supporting two quite different activities: mining and scoring. Each
task presents a different set of requirements.
Building the Data Mining Environment 527

The Mining Platform
The mining platform supports software for data manipulation along with data
mining software embodying the data mining techniques described in this
book, visualization and presentation software, and software to enable models
to be published to the scoring environment.
Although we have already touched on a few integration issues, others to
consider include:
Where in the client/server hierarchy is the software to be installed?

Will the data mining software require its own hardware platform? If so,

will this introduce a new operating system into the mix?
What software will have to be installed on users™ desktops in order to

communicate with the package?
What additional networking, SQL gateways, and middleware will be

Does the data mining software provide good interfaces to reporting and

graphics packages?
The purpose of the mining platform is to support exploration of the data,
mining, and modeling. The system should be devised with these activities in
mind, including the fact that such work requires much processing and com­
puting power. The data mining software vendor should be able to provide
specifications for a data mining platform adequate for the anticipated dataset
sizes and expected usage patterns.

The Scoring Platform
The scoring platform is where models developed on the mining platform are
applied to customer records to create scores used to determine future treat­
ments. Often, the scoring platform is the customer database itself, which is
likely to be a relational database running on a parallel hardware platform.
In order to score a record, the record must contain, or the scoring platform
must be able to calculate, the same features that went into the model. These
features used by the model are rarely in the raw form in which they occur in
the data. Often, new features have been created by combining existing vari­
ables in various ways, such as taking the ratio of one to another and perform­
ing transformations such as binning, summing, and averaging. Whatever was
done to calculate the features used when the model was created must now be
done for every record to be scored. Since there may be hundreds of millions of
transactional records, it matters how this is done. When the volume of data is
large, so is the data processing challenge.
528 Chapter 16

Scoring is not complete until the scores reside on a customer database some­
where accessible to the software that will be used to select customers for inclu­
sion in marketing campaigns. If Web log or call detail or point-of-sale scanner


. 104
( 137 .)