<<

. 135
( 137 .)



>>



SQL data, time series analysis, mean values, 137

572“573 median values, 137

stability-based pruning, decision trees, mode values, 137

191“192 multiple comparisons, 148“149

staffing, data mining, 525“526 normal distribution, 130“132

standard deviation null hypothesis and, 125“126

estimation, 81
probabilities, 133“135

statistics, 132, 138
p-values, 126

variance and, 138
q-values, 126

standard error of proportion, range values, 137

statistical analysis, 139“141
regression ranges, 139

standardization, numeric values, 551
sample variation, 129

standardized values, statistics,
standard deviation, 132, 138

129“133
standardized values, 129“133

star schema structure, relational
sum of values, 137“138

databases, 505
time series analysis, 128“129

statistical analysis
truncated data, 162

business data versus scientific
variance, 138

data, 159
z-values, 131, 138

censored data, 161
statistical regression techniques,
Central Limit Theorem, 129“130
generic algorithms, 423

chi-square tests
status codes, as categorical value, 239

case study, 155“158 stemming, link analysis, 333

degrees of freedom values, stock-keeping units (SKUs), 305

chi-square tests, 152“153 store comparisons, association rules

difference of proportions versus, for, 315“316
153“154
stratification

discussed, 149
customer relationships and, 469

expected values, calculating,
hazards, 410

150“151
strings, fixed-length characters,
continuous variables, 137“138
552“554
correlation ranges, 139
subgroups
cross-tabulations, 136
automatic cluster detection

density function, 133
agglomerative clustering, 368“370

as disciplinary technique, 123
case study, 374“378

discrete values, 127“131
categorical variables, 359

experimentation, 160“161
centroid distance, 369

field values, 128
complete linkage, 369

histograms and, 127
data preparation, 363“365

marketing campaign approaches
dimension, 352

acuity of testing, 147“148
directed clustering, 372

confidence intervals, 146
discussed, 12, 91, 351

proportion, standard error of,
distance and similarity, 359“363

139“141
divisive clustering, 371“372

sample sizes, 145
evaluation, 372“373

Index 641


T
Gaussian mixture model, 366“367

geometric distance, 360“361
tables, lookup, auxiliary information,
hard clustering, 367
570“571

Hertzsprung-Russell diagram,
tainted results, 72

352“354
tangent function, 223

luminosity, 351
target columns, 547

scaling, 363“364
target fields, input variables, 37

single linkage, 369
target market versus control group

soft clustering, 367
response, 38

SOM (self-organizing map), 372
targeted acquisition campaigns, 31

vectors, angles between, 361“362
targeting

weighting, 363“365
good prospects, identifying, 88“89
zone boundaries, adjusting, 380
prospecting, 88

business goals, formulating, 605
taxonomy, products, 305

customer attributes, 11
telecommunications customers,

data transformation, 57
market based analysis, 288

overview, 11
telephone switches, transaction

profiling tasks, 12
processing systems, 3

undirected data mining, 57
terabytes, 5

subscription-based relationships, cus­ Teradata, relational database

tomer relationships, 459“460
management software, 13

subtrees, decision trees, 189
termination of services, 114

sum of values, statistics, 137“138
testing

summarization, data transformation, 44
acuity of, statistical analysis, 147“148
summation function, 272
chi-square tests

supermarket chains, as information
case study, 155“158

brokers, 15“16
CHIDIST function, 152

supervised learning, 57
degrees of freedom values, 152“153

support, market based analysis, 301
difference of proportions versus,

surrogate splits, decision trees, 175
153“154

survey responses
discussed, 149

customer classification, 91
expected values, calculating,

inconclusive, 46
150“151
profiling, 53
splits, decision trees, 180“183

survey-based market research, 113
F tests, 183“184

useful data sources, 61
hypothesis testing

survival analysis confidence levels, 148

attrition, handling different types of, considerations, 51

412“413
decision-making process, 50“51

customer relationships, 413“415
generating, 51

estimation tasks, 10
market basket analysis, 51

forecasting, 415“416
null hypothesis, statistics and,

symmetric multiprocessor (SMP), 125“126
489“490
642 Index


testing (continued)
truncated mean lifetime value,
KS (Kolmogorov-Smirnov) tests, 101
retention, 389

preclassified tests, 79
truthful learning sources, 48“50

test groups, marketing
two-tailed distribution, 134

campaigns, 106

U
test sets

undirected data mining

out of time tests, 72

affinity grouping, 57

<<

. 135
( 137 .)



>>