The data used for market basket analysis is generally not of very high quality.

It is gathered directly at the point of customer contact and used mainly for

operational purposes such as inventory control. The data is likely to have mul

tiple formats, corrections, incompatible code types, and so on. Much of the

explanation of various code values is likely to be buried deep in programming

code running in legacy systems and may be difficult to extract. Different stores

within a single chain sometimes have slightly different product hierarchies or

different ways of handling situations like discounts.

Here is an example. The authors were once curious about the approximately

80 department codes present in a large set of transaction data. The client

assured us that there were 40 departments and provided a nice description of

each of them. More careful inspection revealed the problem. Some stores had

IBM cash registers and others had NCR. The two types of equipment had dif

ferent ways of representing department codes”hence we saw many invalid

codes in the data.

These kinds of problems are typical when using any sort of data for data min

ing. However, they are exacerbated for market basket analysis because this type

of analysis depends heavily on the unsummarized point-of-sale transactions.

Anonymous versus Identified

Market basket analysis has proven useful for mass-market retail, such as

supermarkets, convenience stores, drug stores, and fast food chains, where

many of the purchases have traditionally been made with cash. Cash transac

tions are anonymous, meaning that the store has no knowledge about specific

customers because there is no information identifying the customer in the

transaction. For anonymous transactions, the only information is the date and

time, the location of the store, the cashier, the items purchased, any coupons

redeemed, and the amount of change. With market basket analysis, even this

limited data can yield interesting and actionable results.

The increasing prevalence of Web transactions, loyalty programs, and pur

chasing clubs is resulting in more and more identified transactions, providing

analysts with more possibilities for information about customers and their

behavior over time. Demographic and trending information is available on

individuals and households to further augment customer profiles. This addi

tional information can be incorporated into association rule analysis using vir

tual items.

Generating Rules from All This Data

Calculating the number of times that a given combination of items appears in

the transaction data is well and good, but a combination of items is not a rule.

Market Basket Analysis and Association Rules 309

Sometimes, just the combination is interesting in itself, as in the Barbie doll

and candy bar example. But in other circumstances, it makes more sense to

find an underlying rule of the form:

if condition, then result.

Notice that this is just shorthand. If the rule says,

if Barbie doll, then candy bar

then we read it as: “if a customer purchases a Barbie doll, then the customer is

also expected to purchase a candy bar.” The general practice is to consider

rules where there is just one item on the right-hand side.

Calculating Confidence

Constructs such as the co-occurrence table provide information about which

combinations of items occur most commonly in the transactions. For the sake

of illustration, let™s say that the most common combination has three items, A,

B, and C. Table 9.5 provides an example, showing the probabilities that items

and various combinations are purchased.

The only rules to consider are those with all three items in the rule and with

exactly one item in the result:

If A and B, then C

––

If A and C, then B

––

If B and C, then A

––

Because these three rules contain the same items, they have the same sup

port in the data, 5 percent. What about their confidence level? Confidence is

the ratio of the number of transactions with all the items in the rule to the num

ber of transactions with just the items in the condition. The confidence for the

three rules is shown in Table 9.6.

Table 9.5 Probabilities of Three Items and Their Combinations

COMBINATION PROBABILITY

A 45.0 %

B 42.5%

C 40.0%

A and B 25.0 %

A and C 20.0 %

B and C 15.0%

A and B and C 5.0%

310 Chapter 9

Table 9.6 Confidence in Rules

P(CONDITION

RULE P(CONDITION) AND RESULT) CONFIDENCE

If A 25% 5% 0.20

and B

then C

If A 20% 5% 0.25

and C

then B

If B 15% 5% 0.33

and C

then A

What is confidence really saying? Saying that the rule “if B and C then A” has

a confidence of 0.33 is equivalent to saying that when B and C appear in a

transaction, there is a 33 percent chance that A also appears in it. That is, one

time in three A occurs with B and C, and the other two times, B and C appear

without A. The most confident rule is the best rule, so the best rule is “if B and

C then A.”

Calculating Lift

As described earlier, lift is a good measure of how much better the rule is

doing. It is the ratio of the density of the target (using the left hand side of the

rule) to density of the target overall. So the formula is:

lift = (p(condition and result) / p (condition) ) / p(result)

= p(condition and result) / (p(condition) p(result))

When lift is greater than 1, then the resulting rule is better at predicting the

result than guessing whether the resultant item is present based on item fre

quencies in the data. When lift is less than 1, the rule is doing worse than

informed guessing. The following table (Table 9.7) shows the lift for the three

rules and for the rule with the best lift.

None of the rules with three items shows improved lift. The best rule in the

data actually only has two items. When “A” is purchased, then “B” is 31 per

cent more likely to be in the transaction than if “A” is not purchased. In this

case, as in many cases, the best rule actually contains fewer items than other

rules being considered.

Market Basket Analysis and Association Rules 311

Table 9.7 Lift Measurements for Four Rules

RULE SUPPORT CONFIDENCE P(RESULT) LIFT

If A 5% 0.20 40% 0.50

and B

then C

If A 5% 0.25 42.5% 0.59

and C

then B

If B 5% 0.33 45% 0.74

and C

then A

If A 25% 0.59 42.5% 1.31

then B

The Negative Rule

When lift is less than 1, negating the result produces a better rule. If the rule

if B and C then A

has a confidence of 0.33, then the rule

if B and C then NOT A

has a confidence of 0.67. Since A appears in 45 percent of the transactions, it

does NOT occur in 55 percent of them. Applying the same lift measure shows

that the lift of this new rule is 1.22 (0.67/0.55), resulting in a lift of 1.33, better

than any of the other rules.

Overcoming Practical Limits

Generating association rules is a multistep process. The general algorithm is:

1. Generate the co-occurrence matrix for single items.

2. Generate the co-occurrence matrix for two items. Use this to find rules

with two items.

3. Generate the co-occurrence matrix for three items. Use this to find rules

with three items.

4. And so on.

312 Chapter 9

For instance, in the grocery store that sells orange juice, milk, detergent,

soda, and window cleaner, the first step calculates the counts for each of these

items. During the second step, the following counts are created:

Milk and detergent, milk and soda, milk and cleaner

––

Detergent and soda, detergent and cleaner

––

Soda and cleaner

––

This is a total of 10 pairs of items. The third pass takes all combinations of

three items and so on. Of course, each of these stages may require a separate

pass through the data or multiple stages can be combined into a single pass by

considering different numbers of combinations at the same time.

Although it is not obvious when there are just five items, increasing the

Y

number of items in the combinations requires exponentially more computa

FL

tion. This results in exponentially growing run times”and long, long waits

when considering combinations with more than three or four items. The solu

tion is pruning. Pruning is a technique for reducing the number of items and

AM

combinations of items being considered at each step. At each stage, the algo

rithm throws out a certain number of combinations that do not meet some

threshold criterion.

TE

The most common pruning threshold is called minimum support pruning.

Support refers to the number of transactions in the database where the rule

holds. Minimum support pruning requires that a rule hold on a minimum

number of transactions. For instance, if there are one million transactions and

the minimum support is 1 percent, then only rules supported by 10,000 trans

actions are of interest. This makes sense, because the purpose of generating

these rules is to pursue some sort of action”such as striking a deal with

Mattel (the makers of Barbie dolls) to make a candy-bar-eating doll”and the

action must affect enough transactions to be worthwhile.

The minimum support constraint has a cascading effect. Consider a rule

with four items in it:

if A, B, and C, then D.

Using minimum support pruning, this rule has to be true on at least 10,000

transactions in the data. It follows that:

A must appear in at least 10,000 transactions, and,

B must appear in at least 10,000 transactions, and,

C must appear in at least 10,000 transactions, and,

D must appear in at least 10,000 transactions.

Team-Fly®

Market Basket Analysis and Association Rules 313

In other words, minimum support pruning eliminates items that do not

appear in enough transactions. The threshold criterion applies to each step in

the algorithm. The minimum threshold also implies that:

A and B must appear together in at least 10,000 transactions, and,

A and C must appear together in at least 10,000 transactions, and,

A and D must appear together in at least 10,000 transactions,

and so on.

Each step of the calculation of the co-occurrence table can eliminate combi

nations of items that do not meet the threshold, reducing its size and the num

ber of combinations to consider during the next pass.

Figure 9.11 is an example of how the calculation takes place. In this example,

choosing a minimum support level of 10 percent would eliminate all the com

binations with three items”and their associated rules”from consideration.

This is an example where pruning does not have an effect on the best rule since