. 1
( 137 .)


Data Mining Techniques
For Marketing, Sales, and
Customer Relationship
Second Edition

Michael J.A. Berry
Gordon S. Linoff
Data Mining Techniques
For Marketing, Sales, and
Customer Relationship
Second Edition

Michael J.A. Berry
Gordon S. Linoff
Vice President and Executive Group Publisher: Richard Swadley
Vice President and Executive Publisher: Bob Ipsen
Vice President and Publisher: Joseph B. Wikert
Executive Editorial Director: Mary Bednarek
Executive Editor: Robert M. Elliott
Editorial Manager: Kathryn A. Malm
Senior Production Editor: Fred Bernardi
Development Editor: Emilie Herman, Erica Weinstein
Production Editor: Felicia Robinson
Media Development Specialist: Laura Carpenter VanWinkle
Text Design & Composition: Wiley Composition Services

Copyright ™ 2004 by Wiley Publishing, Inc., Indianapolis, Indiana
All rights reserved.

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by
any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted
under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission
of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance
Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8700. Requests to the Pub­
lisher for permission should be addressed to the Legal Department, Wiley Publishing, Inc., 10475 Crosspoint
Blvd., Indianapolis, IN 46256, (317) 572-3447, fax (317) 572-4447, E-mail: permcoordinator@wiley.com.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in
preparing this book, they make no representations or warranties with respect to the accuracy or completeness
of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for
a particular purpose. No warranty may be created or extended by sales representatives or written sales mate­
rials. The advice and strategies contained herein may not be suitable for your situation. You should consult
with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit
or any other commercial damages, including but not limited to special, incidental, consequential, or other

For general information on our other products and services please contact our Customer Care Department
within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Trademarks: Wiley, the Wiley Publishing logo, are trademarks or registered trademarks of John Wiley & Sons,
Inc. and/or its affiliates in the United States and other countries. All other trademarks are the property of their
respective owners. Wiley Publishing, Inc., is not associated with any product or vendor mentioned in this

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not
be available in electronic books.

Library of Congress Cataloging-in-Publication Data:

Berry, Michael J. A.
Data mining techniques : for marketing, sales, and customer
relationship management / Michael J.A. Berry, Gordon Linoff.” 2nd ed.
p. cm.
Includes index.
ISBN 0-471-47064-3 (paper/website)
1. Data mining. 2. Marketing”Data processing. 3. Business”Data
processing. I. Linoff, Gordon. II. Title.
HF5415.125 .B47 2004

ISBN: 0-471-47064-3

Printed in the United States of America

10 9 8 7 6 5 4 3 2 1
To Stephanie, Sasha, and Nathaniel. Without your patience and
understanding, this book would not have been possible.

” Michael

To Puccio. Grazie per essere paziente con me.

Ti amo.

” Gordon


We are fortunate to be surrounded by some of the most talented data miners
anywhere, so our first thanks go to our colleagues at Data Miners, Inc. from
whom we have learned so much: Will Potts, Dorian Pyle, and Brij Masand.
There are also clients with whom we work so closely that we consider them
our colleagues as well: Harrison Sohmer and Stuart E. Ward, III are in that cat­
egory. Our Editor, Bob Elliott, Editorial Assistant, Erica Weinstein, and Devel­
opment Editor, Emilie Herman, kept us (more or less) on schedule and helped
us maintain a consistent style. Lauren McCann, a graduate student at M.I.T.
and intern at Data Miners, prepared the census data used in some examples
and created some of the illustrations.
We would also like to acknowledge all of the people we have worked with
in scores of data mining engagements over the years. We have learned some­
thing from every one of them. The many whose data mining projects have
influenced the second edition of this book include:
Herb Edelstein Nick Gagliardo
Al Fan
Nick Radcliffe
Alan Parker Jill Holtz
Joan Forrester Patrick Surry
Anne Milley
John Wallace Ronny Kohavi
Brian Guscott
Josh Goff Sheridan Young
Bruce Rylander
Corina Cortes Karen Kennedy Susan Hunt Stevens
Kurt Thearling Ted Browne
Daryl Berry
Daryl Pregibon Lynne Brennen Terri Kowalchuk
Mark Smith Victor Lo
Doug Newell
Ed Freeman Mateus Kehder Yasmin Namini
Michael Patrick Zai Ying Huang
Erin McCarthy
xx Acknowledgments

And, of course, all the people we thanked in the first edition are still deserv­
ing of acknowledgement:

Bob Flynn Jim Flynn Paul Berry
Kamran Parsaye Rakesh Agrawal
Bryan McNeely
Claire Budden Karen Stewart Ric Amari
Larry Bookman Rich Cohen
David Isaac
David Waltz Larry Scroggins Robert Groth
Lars Rohrberg Robert Utzschnieder
Dena d™Ebin
Diana Lin Lounette Dyer Roland Pesch
Marc Goodman Stephen Smith
Don Peppers
Ed Horton Marc Reifeis Sue Osterfelt
Marge Sherold Susan Buchanan
Edward Ewen
Fred Chapman Mario Bourgoin Syamala Srinivasan
Prof. Michael Jordan Wei-Xing Ho
Gary Drescher
Gregory Lampshire Patsy Campbell William Petefish
Paul Becker Yvonne McCollin
Janet Smith
Jerry Modes
About the Authors

Michael J. A. Berry and Gordon S. Linoff are well known in the data mining
field. They have jointly authored three influential and widely read books on
data mining that have been translated into many languages. They each have
close to two decades of experience applying data mining techniques to busi­
ness problems in marketing and customer relationship management.
Michael and Gordon first worked together during the 1980s at Thinking
Machines Corporation, which was a pioneer in mining large databases. In
1996, they collaborated on a data mining seminar, which soon evolved into the
first edition of this book. The success of that collaboration gave them the
courage to start Data Miners, Inc., a respected data mining consultancy, in
1998. As data mining consultants, they have worked with a wide variety of
major companies in North America, Europe, and Asia, turning customer data­
bases, call detail records, Web log entries, point-of-sale records, and billing
files into useful information that can be used to improve the customer experi­
ence. The authors™ years of hands-on data mining experience are reflected in
every chapter of this extensively updated and revised edition of their first
book, Data Mining Techniques.
When not mining data at some distant client site, Michael lives in Cam­
bridge, Massachusetts, and Gordon lives in New York City.



The first edition of Data Mining Techniques for Marketing, Sales, and Customer
Support appeared on book shelves in 1997. The book actually got its start in
1996 as Gordon and I were developing a 1-day data mining seminar for
NationsBank (now Bank of America). Sue Osterfelt, a vice president at
NationsBank and the author of a book on database applications with Bill
Inmon, convinced us that our seminar material ought to be developed into a
book. She introduced us to Bob Elliott, her editor at John Wiley & Sons, and
before we had time to think better of it, we signed a contract.
Neither of us had written a book before, and drafts of early chapters clearly
showed this. Thanks to Bob™s help, though, we made a lot of progress, and the
final product was a book we are still proud of. It is no exaggeration to say that
the experience changed our lives ” first by taking over every waking hour
and some when we should have been sleeping; then, more positively, by pro­
viding the basis for the consulting company we founded, Data Miners, Inc.
The first book, which has become a standard text in data mining, was followed
by others, Mastering Data Mining and Mining the Web.
So, why a revised edition? The world of data mining has changed a lot since
we starting writing in 1996. For instance, back then, Amazon.com was still
new; U.S. mobile phone calls cost on average 56 cents per minute, and fewer
than 25 percent of Americans even owned a mobile phone; and the KDD data
mining conference was in its second year. Our understanding has changed
even more. For the most part, the underlying algorithms remain the same,
although the software in which the algorithms are imbedded, the data to
which they are applied, and the business problems they are used to solve have
all grown and evolved.

xxiv Introduction

Even if the technological and business worlds had stood still, we would
have wanted to update Data Mining Techniques because we have learned so
much in the intervening years. One of the joys of consulting is the constant
exposure to new ideas, new problems, and new solutions. We may not be any
smarter than when we wrote the first edition, but we do have more experience
and that added experience has changed the way we approach the material. A
glance at the Table of Contents may suggest that we have reduced the amount
of business-related material and increased the amount of technical material.
Instead, we have folded some of the business material into the technical chap­
ters so that the data mining techniques are introduced in their business con­
text. We hope this makes it easier for readers to see how to apply the
techniques to their own business problems.
It has also come to our attention that a number of business school courses
have used this book as a text. Although we did not write the book as a text, in
the second edition we have tried to facilitate its use as one by using more
examples based on publicly available data, such as the U.S. census, and by
making some recommended reading and suggested exercises available at the
companion Web site, www.data-miners.com/companion.
The book is still divided into three parts. The first part talks about the busi­
ness context of data mining, starting with a chapter that introduces data min­
ing and explains what it is used for and why. The second chapter introduces
the virtuous cycle of data mining ” the ongoing process by which data min­
ing is used to turn data into information that leads to actions, which in turn
create more data and more opportunities for learning. Chapter 3 is a much-
expanded discussion of data mining methodology and best practices. This
chapter benefits more than any other from our experience since writing the

. 1
( 137 .)