. 46
( 137 .)


With these simple techniques, it is possible to map all the fields for the sam­
ple house record shown earlier (see Table 7.2) and train the network. Training
is a process of iterating through the training set to adjust the weights. Each
iteration is sometimes called a generation.
Once the network has been trained, the performance of each generation
must be measured on the validation set. Typically, earlier generations of the
network perform better on the validation set than the final network (which
was optimized for the training set). This is due to overfitting, (which was dis­
cussed in Chapter 3) and is a consequence of neural networks being so power­
ful. In fact, neural networks are an example of a universal approximator. That
is, any function can be approximated by an appropriately complex neural
network. Neural networks and decision trees have this property; linear and
logistic regression do not, since they assume particular shapes for the under­
lying function.
As with other modeling approaches, neural networks can learn patterns that
exist only in the training set, resulting in overfitting. To find the best network
for unseen data, the training process remembers each set of weights calculated
during each generation. The final network comes from the generation that
works best on the validation set, rather than the one that works best on the
training set.
When the model™s performance on the validation set is satisfactory, the
neural network model is ready for use. It has learned from the training exam­
ples and figured out how to calculate the sales price from all the inputs. The
model takes descriptive information about a house, suitably mapped, and
produces an output. There is one caveat. The output is itself a number between
0 and 1 (for a logistic activation function) or “1 and 1 (for the hyperbolic
tangent), which needs to be remapped to the range of sale prices. For example,
the value 0.75 could be multiplied by the size of the range ($147,000) and
then added to the base number in the range ($103,000) to get an appraisal
value of $213,250.
Artificial Neural Networks 219

Neural Networks for Directed Data Mining

The previous example illustrates the most common use of neural networks:
building a model for classification or prediction. The steps in this process are:
1. Identify the input and output features.
2. Transform the inputs and outputs so they are in a small range, (“1 to 1).
3. Set up a network with an appropriate topology.
4. Train the network on a representative set of training examples.
5. Use the validation set to choose the set of weights that minimizes the
6. Evaluate the network using the test set to see how well it performs.
7. Apply the model generated by the network to predict outcomes for
unknown inputs.
Fortunately, data mining software now performs most of these steps auto­
matically. Although an intimate knowledge of the internal workings is not nec­
essary, there are some keys to using networks successfully. As with all
predictive modeling tools, the most important issue is choosing the right train­
ing set. The second is representing the data in such a way as to maximize
the ability of the network to recognize patterns in it. The third is interpreting
the results from the network. Finally, understanding some specific details
about how they work, such as network topology and parameters controlling
training, can help make better performing networks.
One of the dangers with any model used for prediction or classification is
that the model becomes stale as it gets older”and neural network models are
no exception to this rule. For the appraisal example, the neural network has
learned about historical patterns that allow it to predict the appraised value
from descriptions of houses based on the contents of the training set. There is
no guarantee that current market conditions match those of last week, last
month, or 6 months ago”when the training set might have been made. New
homes are bought and sold every day, creating and responding to market
forces that are not present in the training set. A rise or drop in interest rates, or
an increase in inflation, may rapidly change appraisal values. The problem of
keeping a neural network model up to date is made more difficult by two fac­
tors. First, the model does not readily express itself in the form of rules, so it
may not be obvious when it has grown stale. Second, when neural networks
degrade, they tend to degrade gracefully making the reduction in perfor­
mance less obvious. In short, the model gradually expires and it is not always
clear exactly when to update it.
220 Chapter 7

The solution is to incorporate more recent data into the neural network. One
way is to take the same neural network back to training mode and start feed­
ing it new values. This is a good approach if the network only needs to tweak
results such as when the network is pretty close to being accurate, but you
think you can improve its accuracy even more by giving it more recent exam­
ples. Another approach is to start over again by adding new examples into the
training set (perhaps removing older examples) and training an entirely new
network, perhaps even with a different topology (there is further discussion of
network topologies later). This is appropriate when market conditions may
have changed drastically and the patterns found in the original training set are
no longer applicable.
The virtuous cycle of data mining described in Chapter 2 puts a premium on
measuring the results from data mining activities. These measurements help
in understanding how susceptible a given model is to aging and when a neural
network model should be retrained.

WA R N I N G A neural network is only as good as the training set used to
generate it. The model is static and must be explicitly updated by adding more
recent examples into the training set and retraining the network (or training a
new network) in order to keep it up-to-date and useful.

What Is a Neural Net?
Neural networks consist of basic units that mimic, in a simplified fashion, the
behavior of biological neurons found in nature, whether comprising the brain
of a human or of a frog. It has been claimed, for example, that there is a unit
within the visual system of a frog that fires in response to fly-like movements,
and that there is another unit that fires in response to things about the size of a
fly. These two units are connected to a neuron that fires when the combined
value of these two inputs is high. This neuron is an input into yet another
which triggers tongue-flicking behavior.
The basic idea is that each neural unit, whether in a frog or a computer, has
many inputs that the unit combines into a single output value. In brains, these
units may be connected to specialized nerves. Computers, though, are a bit
simpler; the units are simply connected together, as shown in Figure 7.3, so the
outputs from some units are used as inputs into others. All the examples in
Figure 7.3 are examples of feed-forward neural networks, meaning there is a
one-way flow through the network from the inputs to the outputs and there
are no cycles in the network.
Artificial Neural Networks 221

input 1
This simple neural network
takes four inputs and
input 2 produces an output. This
output result of training this network
is equivalent to the statistical
input 3
technique called logistic
input 4

input 1

This network has a middle layer
input 2 called the hidden layer, which
output makes the network more
powerful by enabling it to
input 3
recognize more patterns.

input 4

input 1
Increasing the size of the hidden
layer makes the network more
input 2
powerful but introduces the risk
output of overfitting. Usually, only one
input 3 hidden layer is needed.

input 4

input 1
output 1
input 2 A neural network can produce
multiple output values.
output 2
input 3
output 3
input 4
Figure 7.3 Feed-forward neural networks take inputs on one end and transform them into
222 Chapter 7

Feed-forward networks are the simplest and most useful type of network
for directed modeling. There are three basic questions to ask about them:
What are units and how do they behave? That is, what is the activation

How are the units connected together? That is, what is the topology of a

How does the network learn to recognize patterns? That is, what is

back propagation and more generally how is the network trained?
The answers to these questions provide the background for understanding
basic neural networks, an understanding that provides guidance for getting
the best results from this powerful data mining technique.

What Is the Unit of a Neural Network?
Figure 7.4 shows the important features of the artificial neuron. The unit com­
bines its inputs into a single value, which it then transforms to produce the
output; these together are called the activation function. The most common acti­
vation functions are based on the biological model where the output remains

very low until the combined inputs reach a threshold value. When the com­
bined inputs reach the threshold, the unit is activated and the output is high.
Like its biological counterpart, the unit in a neural network has the property
that small changes in the inputs, when the combined values are within some
middle range, can have relatively large effects on the output. Conversely, large
changes in the inputs may have little effect on the output, when the combined
inputs are far from the middle range. This property, where sometimes small
changes matter and sometimes they do not, is an example of nonlinear behavior.
The power and complexity of neural networks arise from their nonlinear
behavior, which in turn arises from the particular activation function used by
the constituent neural units.
The activation function has two parts. The first part is the combination func­
tion that merges all the inputs into a single value. As shown in Figure 7.4, each
input into the unit has its own weight. The most common combination func­
tion is the weighted sum, where each input is multiplied by its weight and
these products are added together. Other combination functions are some­
times useful and include the maximum of the weighted inputs, the minimum,
and the logical AND or OR of the values. Although there is a lot of flexibility
in the choice of combination functions, the standard weighted sum works well
in many situations. This element of choice is a common trait of neural net­
works. Their basic structure is quite flexible, but the defaults that correspond
to the original biological models, such as the weighted sum for the combina­
tion function, work well in practice.

Artificial Neural Networks 223

The result is one output value,
usually between -1 and 1.

The transfer function calculates the
output value from the result of the

The combination combination function.
function and -1
transfer function
together constitute
The combination function combines
the activation
all the inputs into a single value,
usually as a weighted summation.


Each input has its own weight,
plus there is an additional
w1 w2
weight called the bias.


. 46
( 137 .)