<<

. 3
( 137 .)



>>


Assessing Directed Models 78

Assessing Classifiers and Predictors 79

Assessing Estimators 79

Comparing Models Using Lift 81

Problems with Lift 83

Step Nine: Deploy Models 84

Step Ten: Assess Results 85

Step Eleven: Begin Again 85

Lessons Learned 86

Chapter 4 Data Mining Applications in Marketing and

Customer Relationship Management 87

Prospecting 87

Identifying Good Prospects 88

Choosing a Communication Channel 89

Picking Appropriate Messages 89

Data Mining to Choose the Right Place to Advertise 90

Who Fits the Profile? 90

Measuring Fitness for Groups of Readers 93

Data Mining to Improve Direct Marketing Campaigns 95

Response Modeling 96

Optimizing Response for a Fixed Budget 97

Optimizing Campaign Profitability 100

How the Model Affects Profitability 103

Reaching the People Most Influenced by the Message 106

Differential Response Analysis 107

Using Current Customers to Learn About Prospects 108

Start Tracking Customers before They Become Customers 109

Gather Information from New Customers 109

Acquisition-Time Variables Can Predict Future Outcomes 110

Data Mining for Customer Relationship Management 110

Matching Campaigns to Customers 110

Segmenting the Customer Base 111

Finding Behavioral Segments 111

Tying Market Research Segments to Behavioral Data 113

Reducing Exposure to Credit Risk 113

Predicting Who Will Default 113

Improving Collections 114

Determining Customer Value 114

Cross-selling, Up-selling, and Making Recommendations 115

Finding the Right Time for an Offer 115

Making Recommendations 116

Retention and Churn 116

Recognizing Churn 116

Why Churn Matters 117

Different Kinds of Churn 118

Contents ix

Different Kinds of Churn Model
119

Predicting Who Will Leave
119

Predicting How Long Customers Will Stay
119

Lessons Learned
120
Chapter 5 The Lure of Statistics: Data Mining Using Familiar Tools 123

Occam™s Razor
124

The Null Hypothesis
125

P-Values 126

A Look at Data 126

Looking at Discrete Values 127

Histograms 127

Time Series 128

Standardized Values 129

From Standardized Values to Probabilities 133

Cross-Tabulations 136

Looking at Continuous Variables 136

Statistical Measures for Continuous Variables 137

Variance and Standard Deviation 138

A Couple More Statistical Ideas 139

Measuring Response 139

Standard Error of a Proportion 139

Comparing Results Using Confidence Bounds 141

Comparing Results Using Difference of Proportions 143

Size of Sample 145

What the Confidence Interval Really Means 146

Size of Test and Control for an Experiment 147

Multiple Comparisons 148

The Confidence Level with Multiple Comparisons 148

Bonferroni™s Correction 149

Chi-Square Test 149

Expected Values 150

Chi-Square Value 151

Comparison of Chi-Square to Difference of Proportions 153

An Example: Chi-Square for Regions and Starts 155

Data Mining and Statistics 158

No Measurement Error in Basic Data 159

There Is a Lot of Data 160

Time Dependency Pops Up Everywhere 160

Experimentation is Hard 160

Data Is Censored and Truncated 161

Lessons Learned 162

Chapter 6 Decision Trees 165

What Is a Decision Tree? 166

Classification 166

Scoring 169

Estimation 170

Trees Grow in Many Forms 170

x Contents


How a Decision Tree Is Grown 171

Finding the Splits 172

Splitting on a Numeric Input Variable 173

Splitting on a Categorical Input Variable 174

Splitting in the Presence of Missing Values 174

Growing the Full Tree 175

Measuring the Effectiveness Decision Tree 176

Tests for Choosing the Best Split 176

Purity and Diversity 177

Gini or Population Diversity 178

Entropy Reduction or Information Gain 179

Information Gain Ratio 180

Chi-Square Test 180

Reduction in Variance 183





Y
F Test 183





FL
Pruning 184

The CART Pruning Algorithm 185

Creating the Candidate Subtrees 185

AM
Picking the Best Subtree 189

Using the Test Set to Evaluate the Final Tree 189

The C5 Pruning Algorithm 190

Pessimistic Pruning 191

TE

Stability-Based Pruning 191

Extracting Rules from Trees 193

Taking Cost into Account 195

Further Refinements to the Decision Tree Method 195

Using More Than One Field at a Time 195

Tilting the Hyperplane 197

Neural Trees 199

Piecewise Regression Using Trees 199

Alternate Representations for Decision Trees 199

<<

. 3
( 137 .)



>>