<<

. 118
( 137 .)



>>


Extraction Tools
Extraction tools (often called ETL tools for extract-transform-load) are gener­
ally used for loading data warehouses and data marts. In most companies,
business users do not have ready access to these tools, and most of their func­
tionality can be found in other tools. Extraction tools are generally on the
expensive side because they are intended for large data warehousing projects.
In Mastering Data Mining (Wiley, 1999), we discuss a case study using a suite
of tools from Ab Initio, Inc., a company that specializes in parallel data trans­
formation software. This case study illustrates the power of such software
when working on very large volumes of data, something to consider in an
environment where such software might be available.


Special-Purpose Code
Coding is the tried-and-true way of implementing data transformations. The
choice of tool is really based on what the programmer is most familiar with
and what tools are available. For the transformations needed for a customer
signature, the main statistical tools all have sufficient functionality.
One downside of using special-purpose code is that it adds an extra layer to
the data transformation process. Data must still be extracted from source systems
(one possible source of error) and then passed through code (another source of
error). It is a good idea to write code that is well documented and reusable.


Data Mining Tools
Increasingly, data mining tools have the ability to transform data within the
tool. Most tools have the ability to extract features from fields and to combine
multiple fields in a row, although the support for non-numeric data types
596 Chapter 17


varies from tool to tool and release to release. Some tools also support sum­
marizations within the customer signature, such as binning variables (where
the binning breakpoints are determined first by looking at the entire set of
data) and standardization.
However, data mining tools are generally weak on looking up values and
doing aggregations. For this reason, the customer signature is almost always
created elsewhere and then loaded into the tool. Tools from leading vendors
allow the embedding of programming code inside the tool and access to data­
bases using SQL. Using these features is a good idea because such features
reduce the number of things to keep track of when transforming data.


Lessons Learned
Data is the gasoline that powers data mining. The goal of data preparation is to
provide a clean fuel, so the analytic engines work as efficiently as possible. For
most algorithms, the best input takes the form of customer signatures, a single
row of data with fields describing various aspects of the customer. Many of these
fields are input fields, a few are targets used for predictive modeling.
Unfortunately, customer signatures are not the way data is found in avail­
able systems”and for good reason, since the signatures change over time. In
fact, they are constantly being built and rebuilt, with newer data and newer
ideas on what constitutes useful information.
Source fields come in several different varieties, such as numbers, strings,
and dates. However, the most useful values are usually those that are added
in. Creating derived values may be as simple as taking the sum of two fields.
Or, they may require much more sophisticated calculations on very large
amounts of data. This is particularly true when trying to capture customer
behavior over time, because time series, whether regular or irregular, must be
summarized for the signature.
Data also suffers (and causes us to suffer along with it) from problems”
missing values, incorrect values, and values from different sources that dis­
agree. Once such problems are identified, it is possible to work around them.
The biggest problems are the unknown ones”data that looks correct but is
wrong for some reason.
Many data mining efforts have to use data that is less than perfect. As with
old cars that spew blue smoke but still manage to chug along the street, these
efforts produce results that are good enough. Like the vagabonds in Samuel
Beckett™s play Waiting for Godot, we can choose to wait until perfection arrives.
That is the path to doing nothing; the better choice is to plow ahead, to learn,
and to make incremental progress.
CHAPTER

18

Putting Data Mining to Work




You™ve reached the last chapter of this book, and you are ready to start putting
data mining to work for your company. You are convinced that when data
mining has been woven into the fabric of your organization, the whole enter­
prise will benefit from an increased understanding of its customers and mar­
ket, from better-focused marketing, from more-efficient utilization of sales
resources, and from more-responsive customer support. You also know that
there is a big difference between understanding something you have read in a
book and actually putting it into practice. This chapter is about how to bridge
that gap.
At Data Miners, Inc., the consulting company founded by the authors of this
book, we have helped many companies through their first data mining pro­
jects. Although this chapter focuses on a company™s first foray into data min­
ing, it is really about how to increase the probability of success for any data
mining project, whether the first or the fiftieth. It brings together ideas from
earlier chapters and applies them to the design of a data mining pilot project.
The chapter begins with general advice about integrating data mining into the
enterprise. It then discusses how to select and implement a successful pilot
project. The chapter concludes with the story of one company™s initial data
mining effort and its success.




597
598 Chapter 18


Getting Started

The full integration of data mining into a company™s customer relationship
management strategy is a large and daunting project. It is best approached
incrementally, with achievable goals and measurable results along the way. The
final goal is to have data mining so well integrated into the decision-making
process that business decisions use accurate and timely customer information
as a matter of course. The first step toward achieving this goal is demonstrating
the real business value of data mining by producing a measurable return on
investment from a manageable pilot or proof-of-concept project. The pilot
should be chosen to be valuable in itself and to provide a solid basis for the
business case needed to justify further investment in analytical CRM.
In fact, a pilot project is not that different from any other data mining proj­
ect. All four phases of the virtuous cycle of data mining are represented in a
pilot project albeit with some changes in emphasis. The proof of concept is lim­
ited in budget and timeframe. Some problems with data and procedures that
would ordinarily need to be fixed may only be documented in a pilot project.

T I P A pilot project is a good first step in the incremental effort to

revolutionize a business using data mining.


Here are the topic sentences for a few of the data mining pilot projects that
we have collaborated on with our clients:
Find 10,000 high-end mobile telephone customers customers who are
––

most likely to churn in October in time for us to start an outbound tele­
marketing campaign in September.
Find differences in the shopping profiles of Hispanic and non-Hispanic
––

shoppers in Texas with respect to ready-to-eat cereals, so we can better
direct our Spanish-language advertising campaigns.
Guide our expansion plans by discovering what our best customers
––

have in common with one another and locate new markets where simi­
lar customers can be found.
Build a model to identify market research segments among the customers
––

in our corporate data warehouse, so we can target messages to the right
customers
Forecast the expected level of debt collection for the next several
––

months, so we can manage to a plan.
These examples show the diversity of problems that data mining can
address. In each case, the data mining challenge is to find and analyze the
appropriate data to solve the business problem. However, this process starts
by choosing the right demonstration project in the first place.
Putting Data Mining to Work 599


What to Expect from a Proof-of-Concept Project
When the proof-of-concept project is complete, the following are available:
A prototype model development system (which might be outsourced or
––

might be the kernel of the production system)
An evaluation of several data mining techniques and tools (unless the
––

choice of tool was foreordained)
A plan for modifying business processes and systems to incorporate
––

data mining
A description of the production data mining environment
––

A business case for investing in data mining and customer analytics
––


Even when the decision has already been made to invest in data mining, the
proof-of-concept project is an important way to step through the virtuous
cycle of data mining for the first time. You should expect challenges and hic­
cups along the way, because such a project is touching several different parts
of the organization”both technical and operational”and needs them to work
together in perhaps unfamiliar ways.


Identifying a Proof-of-Concept Project
The purpose of a proof-of-concept project is to validate the utility of data min­
ing while managing risk. The project should be small enough to be practical
and important enough to be interesting. A successful data mining proof-of-
concept project is one that leads to actions with measurable results. To find
candidates for a proof of concept, study the existing business processes to
identify areas where data mining could provide tangible benefits with results
that can be measured in dollars. That is, the proof of concept should create a
solid business case for further integration of data mining into the company™s
marketing, sales, and customer-support operations.
A good way to attract attention and budget dollars to a project is to use data
mining to meet a real business need. The most convincing proof-of-concept
projects focus on areas that are already being measured and evaluated analyt­
ically, and where there is already an acknowledged need for improvement.
Likely candidates include:
Response models

––

Default risk models

––

Attrition models

––

Usage models

––

Profitability models

––
600 Chapter 18


These are areas where there is a well-defined link between improved accu­
racy of predictions and improved profitability. With some projects, it is easy to
act on the data mining results. This is not to say that pilot projects with a focus
on increased insight and understanding without any direct link to the bottom
line cannot be successful. They are, however, harder to build a business case for.
Potential users of new information are often creative and have good imagi­
nations. During interviews, encourage them to imagine ways to develop true
learning relationships with customers. At the same time, make an inventory of
available data sources, identifying additional fields that may be desirable or
required. Where data is already being warehoused, study the data dictionaries
and database schemas. When the source systems are operational systems,
study the record layouts that will be supplying the data and get to know the
people who are familiar with how the systems process and store information.
As part of the proof-of-concept selection process, do some initial profiling of
the available records and fields to get a preliminary understanding of relation­
ships in the data and to get some early warnings of data problems that may
hinder the data mining process. This effort is likely to require some amount of
data cleansing, filtering, and transformation.
Once several candidate projects have been identified, evaluate them in
terms of the ability to act on the results, the usefulness of the potential results,
the availability of data, and the level of technical effort. One of the most impor­
tant questions to ask about each candidate project is “how will the results be
used?” As illustrated by the example in the sidebar “A Successful Proof of
Concept?” a common fate of data mining pilot projects is to be technically suc­
cessful but underappreciated because no one can figure out what to do with
the results.
There are certainly many examples of successful data mining projects that
originated in IT. Nevertheless, when the people conducting the data mining
are not located in marketing or some other group that communicates directly
with customers, sponsorship or at least input from such a group is important
for a successful project. Although data mining requires interaction with data­
bases and analytic software, it is not primarily an IT project and should rarely
be attempted in isolation from the owners of the business problem being
addressed.

T I P A data mining pilot project may be based in any of several groups within
the company, but it must always include active participation from the group

that feels ownership of the business problem to be addressed.

<<

. 118
( 137 .)



>>