<<

. 127
( 137 .)



>>


deploying models, 84“85

minimum support pruning, 312

derived variables, column data, 542

stability-based, 191“192

descriptions

rectangular regions, 197

comparing values with, 65

regression trees, 170

data transformation, 57

rules, extracting, 193“194

descriptive models, assessing, 78

SAS Enterprise Miner Tree Viewer

descriptive profiling, 52

tool, 167“168

deviation. See standard deviation

scoring, 169“170

difference of proportion

splits

chi-square tests versus, 153“154

on categorical input variables, 174

statistical analysis, 143“144

chi-square testing, 180“183

differential response analysis,

discussed, 170

marketing campaigns, 107“108

diversity measures, 177“178

differentiation, market based

entropy, 179

analysis, 289

finding, 172

dimension

Gini splitting criterion, 178

automatic cluster detection, 352

information gain ratio, 178, 180

dimension tables, OLAP, 502“503

intrinsic information of, 180

directed clustering, automatic cluster

missing values, 174“175

detection, 372

multiway, 171

directed data mining

on numeric input variables, 173

classification, 57

population diversity, 178

discussed, 7

purity measures, 177“178

estimation, 57

reduction in variance, 183

prediction, 57

surrogate, 175

directed graphs, 330

subtrees, selecting, 189

directed models, assessing, 78“79

uses for, 166

directed profiling, 52

declining usage, behavior-based

dirty data, 592“593

variables, 577“579

626 Index


discrete outcomes, classification, 9
equal-height binning, 551

discrete values, statistics, 127“131
equal-width binning, 551

discrimination measures, ROC
erroneous conclusions, 74

curves, 99
errors

dissociation rules, 317
countervailing, 81“82

distance and similarity, automatic
error rates

cluster detection, 359“363
adjusted, 185

distance function
establishing, 79

defined, 271“272
measurement, 159

discussed, 258, 265
operational, 159

hidden distance fields, 278
predicting, 191

identity distance, 271
standard error of proportion,

numeric fields, 275
statistical analysis, 139“141

triangle inequality, 272
established customers, customer

zip codes, 276“277
relationships, 457

distribution
estimation

data exploration, 65
accuracy, 79“81

one-tailed, 134
averages, 81

probability and, 135
business goals, formulating, 605

statistics, 130“132
classification tasks, 9

two-tailed, 134
collaboration filtering, 284“285

diverse data types, 536
data transformation, 57

diversity measures, splitting criteria,
decision trees, 170

decision trees, 177“178
directed data mining, 57

divisive clustering, automatic cluster
estimation task examples, 10

detection, 371“372
examples of, 10

documentation
neural networks, 10, 215

data mining, 536“537
regression models, 10

historical data as, 61
revenue, behavior-based variables,

dumping data, flat files, 594
581“583

standard deviation, 81

E valued outcomes, 9

EBCF (existing base churn
ETL (extraction, transformation,

forecast), 469
and load) tools, 487, 595

economic data, useful data sources, 61
evaluation, automatic cluster

edges, graphs, 322
detection, 372“373

education level, house-hold level
event-based relationships, customer

data, 96
relationships, 458“459

e-mail
existing base churn forecast

as communication channel, 89
(EBCF), 469

free text resources, 556“557
expectations

encoding, inconsistent, data
comparing to results, 31

correction, 74
expected values, chi-square tests,

enterprise-wide data, 33
150“151

entropy, information gain, 178“180
proof-of-concept projects, 599

Index 627


fraudulent insurance claims,

expected churn, 118

classification, 9

experimentation

free text response, memory-based

hypothesis testing, 51

reasoning, 285

<<

. 127
( 137 .)



>>