ñòð. 127 |

deploying models, 84â€“85

minimum support pruning, 312

derived variables, column data, 542

stability-based, 191â€“192

descriptions

rectangular regions, 197

comparing values with, 65

regression trees, 170

data transformation, 57

rules, extracting, 193â€“194

descriptive models, assessing, 78

SAS Enterprise Miner Tree Viewer

descriptive profiling, 52

tool, 167â€“168

deviation. See standard deviation

scoring, 169â€“170

difference of proportion

splits

chi-square tests versus, 153â€“154

on categorical input variables, 174

statistical analysis, 143â€“144

chi-square testing, 180â€“183

differential response analysis,

discussed, 170

marketing campaigns, 107â€“108

diversity measures, 177â€“178

differentiation, market based

entropy, 179

analysis, 289

finding, 172

dimension

Gini splitting criterion, 178

automatic cluster detection, 352

information gain ratio, 178, 180

dimension tables, OLAP, 502â€“503

intrinsic information of, 180

directed clustering, automatic cluster

missing values, 174â€“175

detection, 372

multiway, 171

directed data mining

on numeric input variables, 173

classification, 57

population diversity, 178

discussed, 7

purity measures, 177â€“178

estimation, 57

reduction in variance, 183

prediction, 57

surrogate, 175

directed graphs, 330

subtrees, selecting, 189

directed models, assessing, 78â€“79

uses for, 166

directed profiling, 52

declining usage, behavior-based

dirty data, 592â€“593

variables, 577â€“579

626 Index

discrete outcomes, classification, 9

equal-height binning, 551

discrete values, statistics, 127â€“131

equal-width binning, 551

discrimination measures, ROC

erroneous conclusions, 74

curves, 99

errors

dissociation rules, 317

countervailing, 81â€“82

distance and similarity, automatic

error rates

cluster detection, 359â€“363

adjusted, 185

distance function

establishing, 79

defined, 271â€“272

measurement, 159

discussed, 258, 265

operational, 159

hidden distance fields, 278

predicting, 191

identity distance, 271

standard error of proportion,

numeric fields, 275

statistical analysis, 139â€“141

triangle inequality, 272

established customers, customer

zip codes, 276â€“277

relationships, 457

distribution

estimation

data exploration, 65

accuracy, 79â€“81

one-tailed, 134

averages, 81

probability and, 135

business goals, formulating, 605

statistics, 130â€“132

classification tasks, 9

two-tailed, 134

collaboration filtering, 284â€“285

diverse data types, 536

data transformation, 57

diversity measures, splitting criteria,

decision trees, 170

decision trees, 177â€“178

directed data mining, 57

divisive clustering, automatic cluster

estimation task examples, 10

detection, 371â€“372

examples of, 10

documentation

neural networks, 10, 215

data mining, 536â€“537

regression models, 10

historical data as, 61

revenue, behavior-based variables,

dumping data, flat files, 594

581â€“583

standard deviation, 81

E valued outcomes, 9

EBCF (existing base churn

ETL (extraction, transformation,

forecast), 469

and load) tools, 487, 595

economic data, useful data sources, 61

evaluation, automatic cluster

edges, graphs, 322

detection, 372â€“373

education level, house-hold level

event-based relationships, customer

data, 96

relationships, 458â€“459

existing base churn forecast

as communication channel, 89

(EBCF), 469

free text resources, 556â€“557

expectations

encoding, inconsistent, data

comparing to results, 31

correction, 74

expected values, chi-square tests,

enterprise-wide data, 33

150â€“151

entropy, information gain, 178â€“180

proof-of-concept projects, 599

Index 627

fraudulent insurance claims,

expected churn, 118

classification, 9

experimentation

free text response, memory-based

hypothesis testing, 51

reasoning, 285

ñòð. 127 |