. 59
( 137 .)


etc. etc.

Figure 9.2 A data model for transaction-level market basket data typically has three
tables, one for the customer, one for the order, and one for the order line.
290 Chapter 9

The order is the fundamental data structure for market basket data. An
order represents a single purchase event by a customer. This might correspond
to a customer ordering several products on a Web site or to a customer pur­
chasing a basket of groceries or to a customer buying a several items from a
catalog. This includes the total amount of the purchase, the total amount, addi­
tional shipping charges, payment type, and whatever other data is relevant
about the transaction. Sometimes the transaction is given a unique identifier.
Sometimes the unique identifier needs to be cobbled together from other data.
In one example, we needed to combine four fields to get an identifier for pur­
chases in a store”the timestamp when the customer paid, chain ID, store ID,
and lane ID.
Individual items in the order are represented separately as line items. This
data includes the price paid for the item, the number of items, whether tax
should be charged, and perhaps the cost (which can be used for calculating
margin). The item table also typically has a link to a product reference table,
which provides more descriptive information about each product. This descrip­
tive information should include the product hierarchy and other information
that might prove valuable for analysis.
The customer table is an optional table and should be available when a cus­
tomer can be identified, for example, on a Web site that requires registration or
when the customer uses an affinity card during the transaction. Although the
customer table may have interesting fields, the most powerful element is the
ID itself, because this can tie transactions together over time.
Tracking customers over time makes it possible to determine, for instance,
which grocery shoppers “bake from scratch””something of keen interest to
the makers of flour as well as prepackaged cake mixes. Such customers might
be identified from the frequency of their purchases of flour, baking powder,
and similar ingredients, the proportion of such purchases to the customer™s
total spending, and the lack of interest in prepackaged mixes and ready-to-eat
desserts. Of course, such ingredients may be purchased at different times and
in different quantities, making it necessary to tie together multiple transac­
tions over time.
All three levels of market basket data are important. For instance, to under­
stand orders, there are some basic measures:
What is the average number of orders per customer?

What is the average number of unique items per order?

What is the average number of items per order?

For a given product, what is the proportion of customers who have ever

purchased the product?
Market Basket Analysis and Association Rules 291

For a given product, what is the average number of orders per cus­

tomer that include the item?
For a given product, what is the average quantity purchased in an order

when the product is purchased?
These measures give broad insight into the business. In some cases, there are
few repeat customers, so the proportion of orders per customer is close to 1;
this suggests a business opportunity to increase the number of sales per cus­
tomers. Or, the number of products per order may be close to 1, suggesting an
opportunity for cross-selling during the process of making an order.
It can be useful to compare these measures to each other. We have found that
the number of orders is often a useful way of differentiating among customers;
good customers clearly order more often than not-so-good customers. Figure
9.3 attempts to look at the breadth of the customer relationship (the number of
unique items ever purchased) by the depth of the relationship (the number of
orders) for customers who purchased more than one item. This data is from a
small specialty retailer. The biggest bubble shows that many customers who
purchase two products do so at the same time. There is also a surprisingly
large bubble showing that a sizeable number of customers purchase the same
product in two orders. Better customers”at least those who returned multiple
times”tend to purchase a greater diversity of goods. However, some of them
are returning and buying the same thing they bought the first time. How can
the retailer encourage customers to come back and buy more and different
products? Market basket analysis cannot answer the question, but it can at
least motivate asking it and perhaps provide hints that might help.

Num Distincts Products

Across All Orders

0 1 2 3 4 5 6

Num Orders
Figure 9.3 This bubble plot shows the breadth of customer relationships by the depth of
the relationship.
292 Chapter 9

Order Characteristics
Customer purchases have additional interesting characteristics. For instance,
the average order size varies by time and region”and it is useful to keep track
of these to understand changes in the business environment. Such information
is often available in reporting systems, because it is easily summarized.
Some information, though, may need to be gleaned from transaction-level
data. Figure 9.4 breaks down transactions by the size of the order and the credit
card used for payment”Visa, MasterCard, or American Express”for another
retailer. The first thing to notice is that the larger the order, the larger the average
purchase amount, regardless of the credit card being used. This is reassuring.
Also, the use of one credit card type, American Express, is consistently associ­
ated with larger orders”an interesting finding about these customers.

For Web purchases and mail-order transactions, additional information may

also be gathered at the point of sale:
Did the order use gift wrap?
Is the order going to the same address as the billing address?

Did the purchaser accept or decline a particular cross-sell offer?

Of course, gathering information at the point of sale and having it available

for analysis are two different things. However, gift giving and responsiveness
to cross-sell offers are two very useful things to know about customers. Find­
ing patterns with this information requires collecting the information in the
first place (at the call center or through the online interface) and then moving
it to a data mining environment.

American Express
Average Order Amount





1 2 3 4 5 6 7 8 9

Number of Items Purchased
Figure 9.4 This chart shows the average amount spent by credit card type based on the
number of items in the order for one particular retailer.

Market Basket Analysis and Association Rules 293

Item Popularity
What are the most popular items? This is a question that can usually be
answered by looking at inventory curves, which can be generated without
having to work with transaction-level data. However, knowing the sales of an
individual item is only the beginning. There are related questions:
What is the most common item found in a one-item order?


What is the most common item found in a multi-item order?


What is the most common item found among customers who are repeat


How has the popularity of particular items changed over time?

How does the popularity of an item vary regionally?

The first three questions are particularly interesting because they may
suggest ideas for growing customer relationships. Association rules can pro­
vide answers to these questions, particularly when used with virtual items to
represent the size of the order or the number of orders a customer has made.
The last two questions bring up the dimensions of time and geography,
which are very important for applications of market basket analysis. Differ­
ent products have different affinities in different regions”something that
retailers are very familiar with. It is also possible to use association rules to
start to understand these areas, by introducing virtual items for region and

T I P Time and geography are two of the most important attributes of market
basket data, because they often point to the exact marketing conditions at the
time of the sale.

Tracking Marketing Interventions
As discussed in Chapter 5, looking at individual products over time can pro­
vide a good understanding of what is happening with the product. Including
marketing interventions along with the product sales over time, as in Figure
9.5, makes it possible to see the effect of the interventions. The chart shows a
sales curve for a particular product. Prior to the intervention, sales are hover­
ing at 50 units per week. After the intervention, they peak at about seven or
eight times that amount, before gently sliding down over the six or seven
weeks. Using such charts, it can be possible to measure the response of the
marketing effort.
294 Chapter 9


Mail Drop


. 59
( 137 .)