3.68

2

7 .3

Kitchen

8

6.2

3. Salad Mix

4

Organic Broccoli

Body Care

Figure 10.3 This is an example of a weighted graph where the edge weights are the

number of transactions containing the items represented by the nodes at either end.

Seven Bridges of Königsberg

One of the earliest problems in graph theory originated with a simple chal

lenge posed in the eighteenth century by the Swiss mathematician Leonhard

Euler. As shown in the simple map in Figure 10.4, Königsberg had two islands

in the Pregel River connected to each other and to the rest of the city by a total

of seven bridges. On either side of the river or on the islands, it is possible to

get to any of the bridges. Figure 10.4 shows one path through the town that

crosses over five bridges exactly once. Euler posed the question: Is it possible

to walk over all seven bridges exactly once, starting from anywhere in the city,

without getting wet or using a boat? As an historical note, the problem has sur

vived longer than the name of the city. In the eighteenth century, Königsberg

was a prominent Prussian city on the Baltic Sea nestled between Lithuania and

Poland. Now, it is known as Kaliningrad, the westernmost Russian enclave,

separated from the rest of Russia by Lithuania and Belarus.

In order to solve this problem, Euler invented the notation of graphs. He rep

resented the map of Königsberg as the simple graph with four vertices and seven

edges in Figure 10.5. Some pairs of nodes are connected by more than one edge,

indicating that there is more than one bridge between them. Finding a route that

traverses all the bridges in Königsberg exactly one time is equivalent to finding a

path in the graph that visits every edge exactly once. Such a path is called an

Eulerian path in honor of the mathematician who posed and solved this problem.

326 Chapter 10

A

Pregel River

C

D

N

W E

B

S

Figure 10.4 The Pregel River in Königsberg has two islands connected by a total of seven

bridges.

A

AC

AD

1

AC

2

C D

CD

BC

2

BD

BC

1

B

Figure 10.5 This graph represents the layout of Königsberg. The edges are bridges and the

nodes are the riverbanks and islands.

Link Analysis 327

WHY DO THE DEGREES HAVE TO BE EVEN?

Showing that an Eulerian path exists only when the degrees on all nodes are

even (except at most two) rests on a simple observation. This observation is

about paths in the graph. Consider one path through the bridges:

A ’ C ’ B ’C ’D

The edges being used are:

AC1 ’ BC1 ’ BC2 ’ CD

The edges connecting the intermediate nodes in the path come in pairs. That

is, there is an outgoing edge for every incoming edge. For instance, node C has

four edges visiting it, and node B has two. Since the edges come in pairs, each

intermediate node has an even number of edges in the path. Since an Eulerian

path contains all edges in the graph and visits all the nodes, such a path exists

only when all the nodes in the graph (minus the two end nodes) can serve as

intermediate nodes for the path. This is another way of saying that the degree

of those nodes is even.

Euler also showed that the opposite is true. When all the nodes in a graph

(save at most two) have an even degree, then an Eulerian path exists. This

proof is a bit more complicated, but the idea is rather simple. To construct an

Eulerian path, start at any node (even one with an odd degree) and move to

any other connected node which has an even degree. Remove the edge just

traversed from the graph and make it the first edge in the Eulerian path. Now,

the problem is to find an Eulerian path starting at the second node in the

graph. By keeping track of the degrees of the nodes, it is possible to construct

such a path when there are at most two nodes whose degree is odd.

Euler devised a solution based on the number of edges going into or out of

each node in the graph. The number of such edges is called the degree of a

node. For instance, in the graph representing the seven bridges of Königsberg,

the nodes representing the shores both have a degree of three”corresponding

to the fact that there are three bridges connecting each shore to the islands. The

other two nodes, representing the islands, have degrees of 5 and 3. Euler

showed that an Eulerian path exists only when the degrees of all the nodes in

a graph are even, except at most two (see technical aside). So, there is no way

to walk over the seven bridges of Königsberg without traversing a bridge

more than once, since there are four nodes whose degrees are odd.

Traveling Salesman Problem

A more recent problem in graph theory is the “Traveling Salesman Problem.”

In this problem, a salesman needs to visit customers in a set of cities. He plans

on flying to one of the cities, renting a car, visiting the customer there, then

driving to each of other cities to visit each of the rest of his customers. He

328 Chapter 10

leaves the car in the last city and flies home. There are many possible routes

that the salesman can take. What route minimizes the total distance that he

travels while still allowing him to visit each city exactly one time?

The Traveling Salesman Problem is easily reformulated using graphs, since

graphs are a natural representation of cities connected by roads. In the graph

representing this problem, the nodes are cities and each edge has a weight cor

responding to the distance between the two cities connected by the edge. The

Traveling Salesman Problem therefore is asking: “What is the shortest path

that visits all the nodes in a graph exactly one time?” Notice that this problem

is different from the Seven Bridges of Königsberg. We are not interested in sim

ply finding a path that visits all nodes exactly once, but of all possible paths we

want the shortest one. Notice that all Eulerian paths have exactly the same

length, since they contain exactly the same edges. Asking for the shortest

Eulerian path does not make sense.

Solving the Traveling Salesman Problem for three or four cities is not diffi

cult. The most complicated graph with four nodes is a completely connected

graph where every node in the graph is connected to every other node. In this

graph, 24 different paths visit each node exactly once. To count the number of

paths, start at any of nodes (there are four possibilities), then go to any of the

other three remaining ones, then to any of the other two, and finally to the last

node (4 * 3 * 2 * 1 = 4! = 24). A completely connected graph with n nodes has n!

(n factorial) distinct paths that contain all nodes. Each path has a slightly dif

ferent collection of edges, so their lengths are usually different. Since listing

the 24 possible paths is not that hard, finding the shortest path is not particu

larly difficult for this simple case.

The problem of finding the shortest path connecting nodes was first investi

gated by the Irish mathematician Sir William Rowan Hamilton. His study of

minimizing energy in physical systems led him to investigate minimizing

energy in certain discrete systems that he represented as graphs. In honor of

him, a path that visits all nodes in a graph exactly once is called a Hamiltonian

path.

The Traveling Salesman Problem is difficult to solve. Any solution must con

sider all of the possible paths through the graph in order to determine which

one is the shortest. The number of paths in a completely connected graph grows

very fast”as a factorial. What is true for completely connected graphs is true

for graphs in general: The number of possible paths visiting all the nodes grows

like an exponential function of the number of nodes (although there are a few

simple graphs where this is not true). So, as the number of cities increases, the

effort required to find the shortest path grows exponentially. Adding just one

more city (with associated roads) can result in a solution that takes twice as

long”or more”to find.

Link Analysis 329

This lack of scalability is so important that mathematicians have given it a

name: NP”where NP means that all known algorithms used to solve the

problem scale exponentially”not like a polynomial. These problems are con

sidered difficult. In fact, the Traveling Salesman Problem is so difficult that it is

used for evaluating parallel computers and exotic computing methods”such

as using DNA or the mysteries of quantum physics as the basis of computers

instead of the more familiar computer chips made of silicon.

All of this graph theory aside, there are pretty good heuristic algorithms for

computers that provide reasonable solutions to the Traveling Salesman

Problem. The resulting paths are relatively short paths, although they are not

guaranteed to be as short as the shortest possible one. This is a useful fact if

you have a similar problem. One common algorithm is the greedy algorithm:

start the path with the shortest edge in the graph, then lengthen the path

with the shortest edge available at either end that visits a new node. The result

ing path is generally pretty short, although not necessarily the shortest (see

Figure 10.6).

T I P Often it is better to use an algorithm that yields good, but not perfect

results, instead of trying to analyze the difficulty of arriving at the ideal solution

or giving up because there is no guarantee of finding an optimal solution. As

Voltaire remarked, “Le mieux est l™ennemi du bien.” (The best is the enemy of

the good.)

A

18

12

2

B 2 C 1 D 9 E

11

Figure 10.6 In this graph, the shortest path (ABCDE) has a length of 24, but the greedy

algorithm finds a much longer path (CDBEA).

330 Chapter 10

Directed Graphs

The graphs discussed so far are undirected. In undirected graphs, the edges

are like expressways between nodes: they go in both directions. In a directed

graph, the edges are like one-way roads. An edge going from A to B is distinct

from an edge going from B to A. A directed edge from A to B is an outgoing edge

of A and an incoming edge of B.

Directed graphs are a powerful way of representing data:

Flight segments that connect a set of cities

––

Hyperlinks between Web pages