<<

. 6
( 137 .)



>>


Three Varieties of Cubes 498

Facts 501

Dimensions and Their Hierarchies 502

Conformed Dimensions 504

xvi Contents

Star Schema 505

OLAP and Data Mining 507

Where Data Mining Fits in with Data Warehousing 508

Lots of Data 509

Consistent, Clean Data 510

Hypothesis Testing and Measurement 510

Scalable Hardware and RDBMS Support 511

Lessons Learned 511

Chapter 16
Building the Data Mining Environment 513

A Customer-Centric Organization 514

An Ideal Data Mining Environment 515

The Power to Determine What Data Is Available 515

The Skills to Turn Data into Actionable Information 516

All the Necessary Tools 516

Back to Reality 516

Building a Customer-Centric Organization 516

Creating a Single Customer View 517

Defining Customer-Centric Metrics 519

Collecting the Right Data 520

From Customer Interactions to Learning Opportunities 520

Mining Customer Data 521

The Data Mining Group 521

Outsourcing Data Mining 522

Outsourcing Occasional Modeling 522

Outsourcing Ongoing Data Mining 523

Insourcing Data Mining 524

Building an Interdisciplinary Data Mining Group 524

Building a Data Mining Group in IT 524

Building a Data Mining Group in the Business Units 525

What to Look for in Data Mining Staff 525

Data Mining Infrastructure 526

The Mining Platform 527

The Scoring Platform 527

One Example of a Production Data Mining Architecture 528

Architectural Overview 528

Customer Interaction Module 529

Analysis Module 530

Data Mining Software 532

Range of Techniques 532

Scalability 533

Support for Scoring 534

Multiple Levels of User Interfaces 535

Comprehensible Output 536

Ability to Handle Diverse Data Types 536

Documentation and Ease of Use 536

Contents xvii

Availability of Training for Both Novice and

Advanced Users, Consulting, and Support
537

Vendor Credibility
537

Lessons Learned
537
Chapter 17 Preparing Data for Mining
539
What Data Should Look Like
540

The Customer Signature
540

The Columns 542

Columns with One Value 544

Columns with Almost Only One Value 544

Columns with Unique Values 546

Columns Correlated with Target 547

Model Roles in Modeling 547

Variable Measures 549

Numbers 550

Dates and Times 552

Fixed-Length Character Strings 552

IDs and Keys 554

Names 555

Addresses 555

Free Text 556

Binary Data (Audio, Image, Etc.) 557

Data for Data Mining 557

Constructing the Customer Signature 558

Cataloging the Data 559

Identifying the Customer 560

First Attempt 562

Identifying the Time Frames 562

Taking a Recent Snapshot 562

Pivoting Columns 563

Calculating the Target 563

Making Progress 564

Practical Issues 564

Exploring Variables 565

Distributions Are Histograms 565

Changes over Time 566

Crosstabulations 567

Deriving Variables 568

Extracting Features from a Single Value 569

Combining Values within a Record 569

Looking Up Auxiliary Information 569

Pivoting Regular Time Series 572

Summarizing Transactional Records 574

Summarizing Fields across the Model Set 574

xviii Contents


Examples of Behavior-Based Variables 575

Frequency of Purchase 575

Declining Usage 577

Revolvers, Transactors, and Convenience Users:

Defining Customer Behavior 580

Data 581

Segmenting by Estimating Revenue 581

Segmentation by Potential 583

Customer Behavior by Comparison to Ideals 585

The Ideal Convenience User 587

The Dark Side of Data 590

Missing Values 590

Dirty Data 592

Inconsistent Values 593

Computational Issues 594

Source Systems 594

Extraction Tools 595

Special-Purpose Code 595

Data Mining Tools 595

Lessons Learned 596

Chapter 18
Putting Data Mining to Work 597

Getting Started 598

What to Expect from a Proof-of-Concept Project 599

Identifying a Proof-of-Concept Project 599

Implementing the Proof-of-Concept Project 601

Act on Your Findings 602

Measure the Results of the Actions 603

Choosing a Data Mining Technique 605

Formulate the Business Goal as a Data Mining Task 605

Determine the Relevant Characteristics of the Data 606

Data Type 606

Number of Input Fields 607

Free-Form Text 607

Consider Hybrid Approaches 608

How One Company Began Data Mining 608

<<

. 6
( 137 .)



>>