<<

. 124
( 137 .)



>>


automatic cluster detection, 359

new customer information,

data correction, 73

gathering, 109“110

marriages, 239“240

people most influenced by, 106“107

measures of, 549

planning, 27

neural networks, 239“240

profitability, 100“104

propensity, 242

proof-of-concept projects, 600

splits, decision trees, 174

response modeling, 96“97

censored data

as statistical analysis

hazards, 399“403

acuity of testing, 147“148

statistics, 161

confidence intervals, 146

census data

proportion, standard error of,

proportional scoring, 94“95

139“141
useful data sources, 61

results, comparing, using
Central Limit Theorem, statistics,
confidence bounds, 141“143
129“130

sample sizes, 145

central repository, 484, 488, 490

targeted acquisition campaigns, 31

centroid distance, automatic cluster

types of, 111

detection, 369

up-selling, 115“116

C5 pruning algorithm, decision trees,

usage stimulation, 111

190“191

candidates, link analysis, 333

CHAID (Chi-square Automatic

canonical measurements, marketing

Interaction Detector), 182“183

campaigns, 31

challenges, business challenges,

capture trends, data transformation, 75

identifying, 23“24

620 Index


champion-challenger approach, correct classification matrix, 79

marketing campaigns, 139 data transformation, 57

change processes, feedback, 34 decision trees, 166“168

charts directed data mining, 57

concentration, 101
discrete outcomes, 9

cumulative gains, 101
estimation, 9

lift charts, 82, 84
leaf nodes, 167

time series, 128“129
memory-based reasoning, 90“91

CHIDIST function, 152 overview, 8“9

child nodes, classification, 167 performance, 12

children, number of, house-hold level Classification and Regression Trees
data, 96 (CART) algorithm, decision trees,
chi-square tests 185, 188“189
case study, 155“158 classification codes
CHAID (Chi-square Automatic discussed, 266
Interaction Detector), 182“183 precision measurements, 273“274
CHIDIST function, 152 recall measurements, 273“274
degrees of freedom values, 152“153 clustering
difference of proportions versus, automatic cluster detection
153“154 agglomerative clustering, 368“370
discussed, 149 case study, 374“378
expected values, calculating, 150“151 categorical variables, 359
splits, decision trees, 180“183 centroid distance, 369
churn complete linkage, 369
as binary outcome, 119 data preparation, 363“365
customer longevity, predicting, dimension, 352
119“120
directed clustering, 372
EBCF (existing base churn
discussed, 12, 91, 351
forecast), 469
distance and similarity, 359“363
expected, 118
divisive clustering, 371“372
forced attrition, 118
evaluation, 372“373
importance of, 117“118
Gaussian mixture model, 366“367
involuntary, 118“119, 521
geometric distance, 360“361
recognizing, 116“117
hard clustering, 367
retention and, 116“120
Hertzsprung-Russell diagram,
voluntary, 118“119, 521
352“354
class labels, probability, 85 luminosity, 351
classification scaling, 363“364
accuracy, 79 single linkage, 369
binary soft clustering, 367
decision trees, 168 SOM (self-organizing map), 372
misclassification rates, 98
vectors, angles between, 361“362
business goals, formulating, 605
weighting, 363“365
child nodes, 167
zone boundaries, adjusting, 380
Index 621


competitive advantage, information

business goals, formulating, 605

as, 14

customer attributes, 11

complete linkage, automatic cluster

data transformation, 57

detection, 369

overview, 11

computational issues, customer

profiling tasks, 12

signatures, 594“596

undirected data mining, 57

concentration

coding, special-purpose code, 595

concentration charts, 101

collaborative filtering

cumulative response, 82“83

estimated ratings, 284“285

confidence intervals

grouping customers, 90

hypothesis testing, 148

predictions, 284“285

statistical analysis, 146, 148“149

profiles, building and comparing,

confusion

283“284

aggregation and, 48

social information filtering, 282

confusion matrix, 79

word-of-mouth advertising, 283

data transformation, 28

collections, credit risks, 114

conjugate gradient, 230

columns, data

constant hazards

cost, 548

changing over time hazards versus,

derived variables, 542

416“417

discussed, 542

discussed, 397

identification, 548

continuous variables
ignored, 547

data preparation, 235“237
input, 547

neural networks, 235“237
with one value, 544“546

statistics, 137“138
target, 547

control group response

with unique values, 546“547

marketing campaigns, 106

<<

. 124
( 137 .)



>>