. 11
( 137 .)


Applying Data Mining
BofA worked with data mining consultants from Hyperparallel (then a data
mining tool vendor that has since been absorbed into Yahoo!) to bring a range
of data mining techniques to bear on the problem. There was no shortage of
data. For many years, BofA had been storing data on its millions of retail cus­
tomers in a large relational database on a powerful parallel computer from
NCR/Teradata. Data from 42 systems of record was cleansed, transformed,
aligned, and then fed into the corporate data warehouse. With this system,
BofA could see all the relationships each customer maintained with the bank.
This historical database was truly worthy of the name”some records dating
back to 1914! More recent customer records had about 250 fields, including
demographic fields such as income, number of children, and type of home, as
well as internal data. These customer attributes were combined into a customer
signature, which was then analyzed using Hyperparallel™s data mining tools.
A decision tree derived rules to classify existing bank customers as likely or
unlikely to respond to a home equity loan offer. The decision tree, trained on
thousands of examples of customers who had obtained the product and thou­
sands who had not, eventually learned rules to tell the difference between
them. Once the rules were discovered, the resulting model was used to add yet
another attribute to each prospect™s record. This attribute, the “good prospect”
flag, was generated by a data mining model.
Next, a sequential pattern-finding tool was used to determine when cus­
tomers were most likely to want a loan of this type. The goal of this analysis
was to discover a sequence of events that had frequently preceded successful
solicitations in the past.
Finally, a clustering tool was used to automatically segment the customers
into groups with similar attributes. At one point, the tool found 14 clusters
of customers, many of which did not seem particularly interesting. One clus­
ter, however, was very interesting indeed. This cluster had two intriguing
39 percent of the people in the cluster had both business and personal

This cluster accounted for over a quarter of the customers who had

been classified by the decision tree as likely responders to a home
equity loan offer.
This data suggested to inquisitive data miners that people might be using
home equity loans to start businesses.
The Virtuous Cycle of Data Mining 25

Acting on the Results
With this new understanding, NCAG teamed with the Retail Banking Division
and did what banks do in such circumstances: they sponsored market research
to talk to customers. Now, the bank had one more question to ask: “Will the
proceeds of the loan be used to start a business?” The results from the market
research confirmed the suspicions aroused by data mining, so NCAG changed
the message and targeting on their marketing of home equity loans.
Incidentally, market research and data mining are often used for similar
ends”to gain a better understanding of customers. Although powerful, mar­
ket research has some shortcomings:
Responders may not be representative of the population as a whole.

That is, the set of responders may be biased, particularly by where past
marketing efforts were focused, and hence form what is called an
opportunistic sample.
Customers (particularly dissatisfied customers and former customers)

have little reason to be helpful or honest.
For any given action, there may be an accumulation of reasons. For

instance, banking customers may leave because a branch closed, the
bank bounced a check, and they had to wait too long at ATMs. Market
research may pick up only the proximate cause, although the sequence
is more significant.
Despite these shortcomings, talking to customers and former customers
provides insights that cannot be provided in any other way. This example with
BofA shows that the two methods are compatible.

T I P When doing market research on existing customers, it is a good idea to
use data mining to take into account what is already known about them.

Measuring the Effects
As a result of the new campaign, Bank of America saw the response rate for
home equity campaigns jump from 0.7 percent to 7 percent. According to Dave
McDonald, vice president of the group, the strategic implications of data mining
are nothing short of the transformation of the retail side of the bank from a mass-
marketing institution to a learning institution. “We want to get to the point
where we are constantly executing marketing programs”not just quarterly mail­
ings, but programs on a consistent basis.” He has a vision of a closed-loop mar­
keting process where operational data feeds a rapid analysis process that leads
to program creation for execution and testing, which in turn generates addi­
tional data to rejuvenate the process. In short, the virtuous cycle of data mining.
26 Chapter 2

What Is the Virtuous Cycle?

The BofA example shows the virtuous cycle of data mining in practice. Figure 2.1
shows the four stages:
1. Identifying the business problem.
2. Mining data to transform the data into actionable information.
3. Acting on the information.
4. Measuring the results.

Transform data
into actionable information
using data mining techniques.

business opportunities Act
where analyzing data on the information.
can provide value.

1 2 3 4 5 6 7 8 9 10

Measure the results
of the efforts to complete
the learning cycle.
Figure 2.1 The virtuous cycle of data mining focuses on business results, rather than just
exploiting advanced techniques.
The Virtuous Cycle of Data Mining 27

As these steps suggest, the key to success is incorporating data mining into
business processes and being able to foster lines of communication between
the technical data miners and the business users of the results.

Identify the Business Opportunity
The virtuous cycle of data mining starts with identifying the right business
opportunities. Unfortunately, there are too many good statisticians and compe­
tent analysts whose work is essentially wasted because they are solving prob­
lems that don™t help the business. Good data miners want to avoid this situation.
Avoiding wasted analytic effort starts with a willingness to act on the
results. Many normal business processes are good candidates for data mining:
Planning for a new product introduction

Planning direct marketing campaigns

Understanding customer attrition/churn

Evaluating results of a marketing test

These are examples of where data mining can enhance existing business
efforts, by allowing business managers to make more informed decisions”by
targeting a different group, by changing messaging, and so on.
To avoid wasting analytic effort, it is also important to measure the impact
of whatever actions are taken in order to judge the value of the data mining
effort itself. If we cannot measure the results of mining the data, then we can­
not learn from the effort and there is no virtuous cycle.
Measurements of past efforts and ad hoc questions about the business also
suggest data mining opportunities:
What types of customers responded to the last campaign?

Where do the best customers live?

Are long waits at automated tellers a cause of customers™ attrition?

Do profitable customers use customer support?

What products should be promoted with Clorox bleach?

Interviewing business experts is another good way to get started. Because
people on the business side may not be familiar with data mining, they
may not understand how to act on the results. By explaining the value of data
mining to an organization, such interviews provide a forum for two-way
We once participated in a series of interviews at a telecommunications com­
pany to discuss the value of analyzing call detail records (records of completed
calls made by each customer). During one interview, the participants were
slow in understanding how this could be useful. Then, a colleague pointed out
28 Chapter 2

that lurking inside their data was information on which customers used fax
machines at home (the details of this are discussed in Chapter 10 on Link
Analysis). Click! Fax machine usage would be a good indicator of who was
working from home. And to make use of that information, there was a specific
product bundle for the work-at-home crowd. Without our prodding, this
marketing group would never have considered searching through data to find
this information. Joining the technical and the business highlighted a very
valuable opportunity.

T I P When talking to business users about data mining opportunities, make
sure they focus on the business problems and not technology and algorithms.
Let the technical experts focus on the technology and the business experts
focus on the business.

Mining Data
Data mining, the focus of this book, transforms data into actionable results.
Success is about making business sense of the data, not using particular algo­
rithms or tools. Numerous pitfalls interfere with the ability to use the results of
data mining:
Bad data formats, such as not including the zip code in the customer

address in the results
Confusing data fields, such as a delivery date that means “planned


delivery date” in one system and “actual delivery date” in another


Lack of functionality, such as a call-center application that does not

allow annotations on a per-customer basis
Legal ramifications, such as having to provide a legal reason when

rejecting a loan (and “my neural network told me so” is not acceptable)
Organizational factors, since some operational groups are reluctant to

change their operations, particularly without incentives
Lack of timeliness, since results that come too late may no longer be

Data comes in many forms, in many formats, and from multiple systems, as
shown in Figure 2.2. Identifying the right data sources and bringing them
together are critical success factors. Every data mining project has data issues:
inconsistent systems, table keys that don™t match across databases, records over­
written every few months, and so on. Complaints about data are the number one
excuse for not doing anything. The real question is “What can be done with avail­
able data?” This is where the algorithms described later in this book come in.
The Virtuous Cycle of Data Mining 29

External sources of



lifestyle, and credit


. 11
( 137 .)