. 69
( 137 .)



70 3

Figure 10.9 A call graph for 15 numbers and 19 calls.
Link Analysis 341

Figure 10.10 shows how the algorithm proceeds. First, the numbers that are
known to be fax machines are labeled “F,” and the numbers for directory assis­
tance are labeled “I.” Any edge for a call that lasted less than 10 seconds has
been dropped. The algorithm colors the graph by assigning labels to each node
using an iterative procedure:
Any “voice” node connected to a “fax” node is labeled “shared.”

Any “unknown” node connected mostly to “fax” nodes is labeled “fax.”

This procedure continues until all nodes connected to “fax” nodes have a
“fax” or “shared” label.


This is the initial call graph
with short calls removed and
with nodes labeled as “fax,”
“unknown,” and “information.”



F S U Nodes connected to the initial
fax machines are assigned
F I the “fax” label.

Those connected to
F V U “information” are assigned
the “voice” label.
Those connected to both, are
The rest are “unknown.”

Figure 10.10 Applying the graph-coloring algorithm to the call graph shows which
numbers are fax numbers and which are shared.
342 Chapter 10


Although the case study implemented the graph coloring using special-purpose
C++ code, these operations are suitable for data stored in a relational database.
Assume that there are three tables: call_detail, dedicated_fax, and shared_fax.
The query for finding the numbers that call a known fax number is:
SELECT originating_number
FROM call_detail
WHERE terminating_number IN (SELECT number FROM dedicated_fax)
AND duration >= 10
GROUP BY originating_number;
A similar query can be used to get the calls made by a known fax number.
However, this does not yet distinguish between dedicated fax lines and shared

fax lines. To do this, we have to know if any calls were made to information. For
efficiency reasons, it is best to keep this list in a separate table or view,

voice_numbers, defined by:
SELECT originating_number
FROM call_detail
WHERE terminating_number in (˜5551212™, ˜411™)
GROUP BY originating_number;

So the query to find dedicated fax lines is:
SELECT originating_number
FROM call_detail
WHERE terminating_number IN (SELECT number FROM dedicated_fax)
AND duration > 9
AND originating_number NOT IN (SELECT number FROM voice_numbers)
GROUP BY originating_number;

and for shared lines it is:
SELECT originating_number
FROM call_detail
WHERE terminating_number IN (SELECT number FROM dedicated_fax)
AND duration > 2
AND originating_number IN (SELECT number FROM voice_numbers)
GROUP BY originating_number;
These SQL queries are intended to show that finding fax machines is
possible on a relational database. They are probably not the most efficient SQL
statements for this purpose, depending on the layout of the data, the database
engine, and the hardware it is running on. Also, if there is a significant number
of calls in the database, any SQL queries for link analysis will require joins on
very large tables.

Link Analysis 343

Case Study: Segmenting Cellular
Telephone Customers
This case study applies link analysis to cellular telephone calls for the purpose
of segmenting existing customers for selling new services.1 Analyses similar to
those presented here were used with a leading cellular provider. The results
from the analysis were used for a direct mailing for a new product offering. On
such mailings, the cellular company typically measured a response rate of 2
percent to 3 percent. With some of the ideas presented here, it increased its
response rate to over 15 percent, a very significant improvement.

The Data
Cellular telephone data is similar to the call detail data seen in the previous case
study for finding fax machines. There is a record for each call that includes
fields such as:
Originating number

Terminating number

Location where the call was placed

Account number of the person who originated the call

Call duration

Time and date

Although the analysis did not use the account number, it plays an important
role in this data because the data did not otherwise distinguish between busi­
ness and residential accounts. Accounts for larger businesses have thousands
of phones, while most residential accounts have only a single phone.

Analyses without Graph Theory
Prior to using link analysis, the marketing department used a single measure­
ment for segmentation: minutes of use (MOU), which is the number of min­
utes each month that a customer uses on the cellular phone. MOU is a useful
measure, since there is a direct correlation between MOU and the amount
billed to the customer each month. This correlation is not exact, since it does
not take into account discount periods and calling plans that offer free nights
and weekends, but it is a good guide nonetheless.
The marketing group also had external demographic data for prospective
customers. They could also distinguish between individual customers and
business accounts. In addition to MOU, though, their only understanding of

The authors would like to thank their colleagues Alan Parker, William Crowder, and Ravi
Basawi for their contributions to this section.
344 Chapter 10

customer behavior was the total amount billed and whether customers paid
the bills in a timely matter. They were leaving a lot of information on the table.

A Comparison of Two Customers
Figure 10.11 illustrates two customers and their calling patterns during a
typical month. These two customers have similar MOU, yet the patterns are
strikingly different. John™s calls generate a small, tight graph, while Jane™s
explodes with many different calls. If Jane is happy with her wireless service,
her use will likely grow and she might even influence many of her friends and
colleagues to switch to the wireless provider.
Looking at these two customers more closely reveals important differences.
Although John racks up 150 to 200 MOU every month on his car phone, his use
of his mobile telephone consists almost exclusively of two types of calls:
On his way home from work, he calls his wife to let her know what

time to expect him. Sometimes they chat for three or four minutes.
Every Wednesday morning, he has a 45-minute conference call that he

takes in the car on his morning commute.
The only person who has John™s car phone number is his wife, and she
rarely calls him when he is driving. In fact, John has another mobile phone that
he carries with him for business purposes. When driving, he prefers his car
phone to his regular portable phone, although his car phone service provider
does not know this.
10 M


10 OU
20 M

John 150 MOU Jane 20 MOU

40 M
30 M
20 M


Figure 10.11 John and Jane have about the same minutes of use each month, but their
behavior is quite different.
Link Analysis 345

Jane also racks up about the same usage every month on her mobile phone.
She has four salespeople reporting to her that call her throughout the day,
often leaving messages on her mobile phone voice mail when they do not
reach her in the car. Her calls include calls to management, potential cus­
tomers, and other colleagues. Her calls, though, are always quite short”
almost always a minute or two, since she is usually scheduling meetings.
Working in a small business, she is sensitive to privacy and to the cost of the
calls so out of habit uses land lines for longer discussions.
Now, what happens if Jane and John both get an offer from a competitor?
Who is more likely to accept the competing offer (or churn in the vocabulary of
wireless telecommunications companies)? At first glance, we might suspect
that Jane is the more price-sensitive and therefore the more susceptible to


. 69
( 137 .)