<<

. 4
( 137 .)



>>

Box Diagrams 199

Tree Ring Diagrams 201

Decision Trees in Practice 203

Decision Trees as a Data Exploration Tool 203

Applying Decision-Tree Methods to Sequential Events 205

Simulating the Future 206

Case Study: Process Control in a Coffee-Roasting Plant 206

Lessons Learned 209

Chapter 7 Artificial Neural Networks 211

A Bit of History 212

Real Estate Appraisal 213

Neural Networks for Directed Data Mining 219

What Is a Neural Net? 220

What Is the Unit of a Neural Network? 222

Feed-Forward Neural Networks 226




Team-Fly®
Contents xi

How Does a Neural Network Learn Using

Back Propagation?
228

Heuristics for Using Feed-Forward,

Back Propagation Networks
231

Choosing the Training Set
232

Coverage of Values for All Features
232

Number of Features
233

Size of Training Set
234

Number of Outputs
234

Preparing the Data 235

Features with Continuous Values 235

Features with Ordered, Discrete (Integer) Values 238

Features with Categorical Values 239

Other Types of Features 241

Interpreting the Results 241

Neural Networks for Time Series 244

How to Know What Is Going on Inside a Neural Network 247

Self-Organizing Maps 249

What Is a Self-Organizing Map? 249

Example: Finding Clusters 252

Lessons Learned 254

Chapter 8 Nearest Neighbor Approaches: Memory-Based

Reasoning and Collaborative Filtering 257

Memory Based Reasoning 258

Example: Using MBR to Estimate Rents in Tuxedo, New York 259

Challenges of MBR 262

Choosing a Balanced Set of Historical Records 262

Representing the Training Data 263

Determining the Distance Function, Combination

Function, and Number of Neighbors 265

Case Study: Classifying News Stories 265

What Are the Codes? 266

Applying MBR 267

Choosing the Training Set 267

Choosing the Distance Function 267

Choosing the Combination Function 267

Choosing the Number of Neighbors 270

The Results 270

Measuring Distance 271

What Is a Distance Function? 271

Building a Distance Function One Field at a Time 274

Distance Functions for Other Data Types 277

When a Distance Metric Already Exists 278

The Combination Function: Asking the Neighbors

for the Answer 279

The Basic Approach: Democracy 279

Weighted Voting 281

xii Contents


Collaborative Filtering: A Nearest Neighbor Approach to

Making Recommendations 282

Building Profiles 283

Comparing Profiles 284

Making Predictions 284

Lessons Learned 285

Chapter 9 Market Basket Analysis and Association Rules 287

Defining Market Basket Analysis 289

Three Levels of Market Basket Data 289

Order Characteristics 292

Item Popularity 293

Tracking Marketing Interventions 293

Clustering Products by Usage 294

Association Rules 296

Actionable Rules 296

Trivial Rules 297

Inexplicable Rules 297

How Good Is an Association Rule? 299

Building Association Rules 302

Choosing the Right Set of Items 303

Product Hierarchies Help to Generalize Items 305

Virtual Items Go beyond the Product Hierarchy 307

Data Quality 308

Anonymous versus Identified 308

Generating Rules from All This Data 308

Calculating Confidence 309

Calculating Lift 310

The Negative Rule 311

Overcoming Practical Limits 311

The Problem of Big Data 313

Extending the Ideas 315

Using Association Rules to Compare Stores 315

Dissociation Rules 317

Sequential Analysis Using Association Rules 318

Lessons Learned 319

Chapter 10
Link Analysis 321

Basic Graph Theory 322

Seven Bridges of Königsberg 325

Traveling Salesman Problem 327

Directed Graphs 330

Detecting Cycles in a Graph 330

A Familiar Application of Link Analysis 331

The Kleinberg Algorithm 332

The Details: Finding Hubs and Authorities 333

Creating the Root Set 333

Identifying the Candidates 334

Ranking Hubs and Authorities 334

Hubs and Authorities in Practice 336

Contents xiii


Case Study: Who Is Using Fax Machines from Home?
336

Why Finding Fax Machines Is Useful
336

The Data as a Graph
337

The Approach
338

Some Results 340

Case Study: Segmenting Cellular Telephone Customers
343

The Data 343

Analyses without Graph Theory
343

A Comparison of Two Customers
344

The Power of Link Analysis 345

Lessons Learned 346

Chapter 11 Automatic Cluster Detection 349

<<

. 4
( 137 .)



>>