. 114
( 137 .)


way of pivoting. For irregular time series, it is necessary to determine how best
to summarize the data.
One way is to transform the irregular time series into regular time series and
then to pivot the series. For instance, calculate the number of calls per month
or the amount withdrawn from ATMs each month, and then pivot the sums by
month. When working with transactions, these calculations can be more com­
plex, such as the number of calls longer than 10 minutes or the number of
withdrawals less than $50. These specialized summaries can be quite useful.
More complicated examples that describe customer behavior are provided just
after the next section.
Another approach is to define a set of data transformations that are run on
the transactional data as it is being collected. This is an approach taken in the
telecommunications industry, where the volume of data is vast. Some vari­
ables may be as simple as minutes of use, others may be a complex as a score
for whether the calling number is a business or residence. This approach hard-
codes the calculations, and such calculations are hard to change. Although
such variables can be useful, a more flexible environment for summarizing
transactional data is strategically more useful.

Summarizing Fields across the Model Set
The last method for deriving variables is summarizing values across fields in
the customer signature itself. There are several examples of such fields:
Binning values into equal sized bins requires calculating the breakpoints

for the bins.
Standardizing a value (subtracting the mean and dividing by the stan­

dard deviation) requires calculating the mean and standard deviation
for the field and then doing the calculation.
Preparing Data for Mining 575

Ranking a value (so the smallest value has a value of 1, the second

smallest 2, and so on) requires sorting all the values to get the ranking.
Although these are complicated operations, they are performed directly on
the model set. Data mining tools provide support for these operations, espe­
cially for binning numeric values, which is the most important of the three.
One type of binning that would be very useful is not readily available. This
is binning for codes based on frequency. That is, it would be useful to keep all
codes that have at least, say, 1,000 instances in the model set and to place all
other codes in a single “other” category. This is useful for working with out­
liers, such as the many old and unpopular handsets that show up in mobile
telephone data although few customers use them. One way to handle this is to
identify the handsets to keep and to add a new field “handset for analysis”
that keeps these handsets and places the rest into an “other” category. A more
automated way is to create a lookup table to map the handsets. However, per­
haps a better way is to replace the handset ID itself with information such as
the date the handset was released, its weight, and the features it uses”infor-
mation that is probably available in a lookup table already.

Examples of Behavior-Based Variables
The real power of derived variables comes from the ability to summarize cus­
tomer behaviors along known dimensions. This section builds on the ideas
already presented and gives three examples of useful behavior-based variables.

Frequency of Purchase
Once upon a time, catalogers devised a clever method for characterizing cus­
tomer behavior using three dimensions”recency, frequency, and monetary
value. RFM, which relies on these three variables, has been used at least since the
1970s. Of these three descriptions of customer behavior, recency is usually the
most predictive, but frequency is the most interesting. Recency simply means
the length of time since a customer made a purchase. Monetary value is tradi­
tionally the total amount purchased (although we have found the average pur­
chase value more useful since the total is highly correlated with frequency).
In traditional RFM analysis, frequency is just the number of purchases.
However, a simple count does not do a good job of characterizing customer
behavior. There are other approaches to determining frequency, and these can
be applied to other areas not related to catalog purchasing”frequency of com­
plaints, frequency of making international telephone calls, and so on. The
important point is that customers may perform an action at irregular intervals,
and we want to characterize this behavior pattern because it provides poten­
tially useful information about customers.
576 Chapter 17

One method of calculating frequency would be to take the length of time
indicated by the historical data and divide it by the number of times the cus­
tomer made a purchase. So, if the catalog data goes back 6 years and a cus­
tomer made a single purchase, then that frequency would be once every 6
Although simple, this approach misses an important point. Consider two
John made a purchase 6 years ago and has received every catalog since

Mary made a purchase last month when she first received the catalog.

Does it make sense that both these customers have the same frequency? No.
John more clearly has a frequency of no more than once every 6 years. Mary
only had the opportunity to make one purchase in the past month, so her fre­
quency would more accurately be described as once per month. The first point
about frequency is that it should be measured from the first point in time that
a customer had an opportunity to make a purchase.
There is another problem. What we really know about John and Mary is that
their frequencies are no more than once every 6 years and no more than once
per month, respectively. Historically, one observation does not contain enough
information to deduce a real frequency. This is really a time to event problem,
such as those discussed in Chapter 12.
Our goal here is to characterize frequency as a derived variable, rather than
predict the next event (which is best approached using survival analysis). To
do this, let™s assume that there are two or more events, so the average time
between events is the total span of time divided by the number of events
minus one, as shown in Figure 17.14. This provides the average time between
events for the period when the events occurred.
There is no perfect solution to the question of frequency, because customer
events occur irregularly and we do not know what will happen in the future”
the data is censored. Taking the time span from the first event to the most
recent event runs into the problem that customers whose events all took place
long ago may have a high frequency. The alternative is to take the time since
the first event, in essence pretending that the present is an event. This is unsat­
isfying, because the next event is not known, and care must be taken when
working with censored data. In practice, taking the number of events since the
first event could have happened and dividing by the total span of time (or the
span when the customer was active) is the best solution.
Preparing Data for Mining 577




Frequency is 2 / (C “ A), but does not include
time after C

Frequency is 3 / C, but does not include time after C

Frequency is 3 / (D - A), but data is censored

Frequency is 3/D, but data is censored
Figure 17.14 There is no perfect way to estimate frequency, but these four ways are all

Declining Usage
In telecommunications, one significant predictor of churn is declining usage”
customers who use services less and less over time are more likely to leave
than other customers. Customers who have declining usage are likely to have
many variables indicating this:
Billing measures, such as recent amounts spent are quite small.

Usage measures, such as recent amounts used are quite small or always

at monthly minimums.
Optional services recently have no usage.

Ratios of recent measures to older measures are less than 1, often signif­

icantly less than one, indicating recent usage is smaller than historical
The existence of so many different measures for the same underlying behav­
ior suggests a situation where a derived variable might be useful to capture the
behavior in a single variable. The goal is to incorporate as much information as
possible into a “declining usage” indicator.
578 Chapter 17

T I P When many different variables all suggest a single customer behavior,
then it is likely that a derived variable that incorporates this information will do
a better job for data mining.

Fortunately, mathematics provides an elegant solution, in the form of the
best fit line, as shown in Figure 17.15. The goodness of fit is described by the R2
statistic, which varies from 0 to 1, with values near 0 being poor fit and values
near 1 being very good. The slope of the line provides the average rate of
increase or decrease in some variable over time. In statistics, this slope is called
the beta function and is calculated according to the following formula:

Sum of (x-average(x))*(y-average(y)) / sum((x-average(x))2)

To give an example of how this might be used, consider the following data
for the customer shown in the previous figure. Table 17.4 walks through the
calculation for a typical customer.

Table 17.4 Example of Calculating the Slope for a Time Series

MONTH X “ AVG(X) (X “ AVG Y (FROM Y“ (X “ AVG(X)) *
(X “VALUE) (X))^2 CUST A) AVG(Y) (Y “ AVG( Y))

1 “5.5 30.25 53.47 3.19 “17.56

2 “4.5 20.25 46.61 “3.67 16.52
3 “3.5 12.25 47.18 “3.10 10.84

4 “2.5 6.25 49.54 “0.74 1.85

5 “1.5 2.25 48.71 “1.57 2.35

6 “0.5 0.25 52.04 1.76 “0.88

7 0.5 0.25 48.45 “1.83 “0.91

8 1.5 2.25 54.16 3.88 5.83

9 2.5 6.25 54.47 4.19 10.47

10 3.5 12.25 53.69 3.42 11.95

11 4.5 20.25 45.93 “4.35 “19.59

12 5.5 30.25 49.10 “1.18 “6.51

TOTAL 143 14.36

SLOPE 0.1004
Preparing Data for Mining 579



y = 0.1007x + 49.625

R2 = 0.0135



1 2 3 4 5 6 7 8 9 10 11 12
Figure 17.15 The slope of the line of best fit provides a good measure of
changes over time.

This example shows a very typical use for calculating the slope”finding the
slope over the previous year™s usage or billing patterns. The tabular format
shows the calculation in a way most suitable for a spreadsheet. However,
many data mining tools provide a function to calculate beta values directly
from a set of variables in a single row. When such a function is not available, it
is possible to express it using more basic arithmetic functions.
Although monthly data is often the most convenient for such calculations,
remember that different months have different numbers of days. This issue is
particularly significant for businesses that have strong weekly cycles. Some
months have five full weekends, for instance, while others only have four. Dif­
ferent months have between 20 and 23 working days (not including holidays).
These differences can account for up to 25 percent of the difference between


. 114
( 137 .)