. 51
( 137 .)




Figure 7.11 Running a neural network on 10 examples from the validation set can help
determine how to interpret results.

Neural Networks for Time Series
In many business problems, the data naturally falls into a time series. Examples
of such series are the closing price of IBM stock, the daily value of the Swiss
franc to U.S. dollar exchange rate, or a forecast of the number of customers who
will be active on any given date in the future. For financial time series, someone
who is able to predict the next value, or even whether the series is heading up
Artificial Neural Networks 245

or down, has a tremendous advantage over other investors. Although predom­
inant in the financial industry, time series appear in other areas, such as fore­
casting and process control. Financial time series, though, are the most studied
since a small advantage in predictive power translates into big profits.
Neural networks are easily adapted for time-series analysis, as shown in
Figure 7.12. The network is trained on the time-series data, starting at the
oldest point in the data. The training then moves to the second oldest point,
and the oldest point goes to the next set of units in the input layer, and so on.
The network trains like a feed-forward, back propagation network trying to
predict the next value in the series at each step.

Time lag

Historical units
value 1, time t

Hidden layer

value 1, time t-1

value 1, time t-2 output

value 2, time t value 1, time t+1

value 2, time t-1

value 2, time t-2
Figure 7.12 A time-delay neural network remembers the previous few training examples
and uses them as input into the network. The network then works like a feed-forward, back
propagation network.
246 Chapter 7

Notice that the time-series network is not limited to data from just a single
time series. It can take multiple inputs. For instance, to predict the value of the
Swiss franc to U.S. dollar exchange rate, other time-series information might be
included, such as the volume of the previous day™s transactions, the U.S. dollar
to Japanese yen exchange rate, the closing value of the stock exchange, and the
day of the week. In addition, non-time-series data, such as the reported infla­
tion rate in the countries over the period of time under investigation, might
also be candidate features.
The number of historical units controls the length of the patterns that the
network can recognize. For instance, keeping 10 historical units on a network
predicting the closing price of a favorite stock will allow the network to recog­
nize patterns that occur within 2-week time periods (since exchange rates are
set only on weekdays). Relying on such a network to predict the value 3
months in the future may not be a good idea and is not recommended.
Actually, by modifying the input, a feed-forward network can be made to
work like a time-delay neural network. Consider the time series with 10 days
of history, shown in Table 7.5. The network will include two features: the day
of the week and the closing price.
Create a time series with a time lag of three requires adding new features for
the historical, lagged values. (Day-of-the-week does not need to be copied,
since it does not really change.) The result is Table 7.6. This data can now be
input into a feed-forward, back propagation network without any special sup­
port for time series.

Table 7.5 Time Series


1 1 $40.25

2 2 $41.00

3 3 $39.25

4 4 $39.75

5 5 $40.50

6 1 $40.50

7 2 $40.75

8 3 $41.25
9 4 $42.00

10 5 $41.50
Artificial Neural Networks 247

Table 7.6 Time Series with Time Lag


1 1 $40.25

2 2 $41.00 $40.25

3 3 $39.25 $41.00 $40.25

4 4 $39.75 $39.25 $41.00

5 5 $40.50 $39.75 $39.25

6 1 $40.50 $40.50 $39.75

7 2 $40.75 $40.50 $40.50

8 3 $41.25 $40.75 $40.50

9 4 $42.00 $41.25 $40.75

10 5 $41.50 $42.00 $41.25

How to Know What Is Going on
Inside a Neural Network
Neural networks are opaque. Even knowing all the weights on all the nodes
throughout the network does not give much insight into why the network
produces the results that it produces. This lack of understanding has some philo­
sophical appeal”after all, we do not understand how human consciousness
arises from the neurons in our brains. As a practical matter, though, opaqueness
impairs our ability to understand the results produced by a network.
If only we could ask it to tell us how it is making its decision in the form of
rules. Unfortunately, the same nonlinear characteristics of neural network
nodes that make them so powerful also make them unable to produce simple
rules. Eventually, research into rule extraction from networks may bring
unequivocally good results. Until then, the trained network itself is the rule,
and other methods are needed to peer inside to understand what is going on.
A technique called sensitivity analysis can be used to get an idea of how
opaque models work. Sensitivity analysis does not provide explicit rules, but
it does indicate the relative importance of the inputs to the result of the net­
work. Sensitivity analysis uses the test set to determine how sensitive the out­
put of the network is to each input. The following are the basic steps:
1. Find the average value for each input. We can think of this average

value as the center of the test set.

248 Chapter 7

2. Measure the output of the network when all inputs are at their average
3. Measure the output of the network when each input is modified, one at
a time, to be at its minimum and maximum values (usually “1 and 1,
For some inputs, the output of the network changes very little for the three
values (minimum, average, and maximum). The network is not sensitive to
these inputs (at least when all other inputs are at their average value). Other
inputs have a large effect on the output of the network. The network is
sensitive to these inputs. The amount of change in the output measures the sen­
sitivity of the network for each input. Using these measures for all the inputs
creates a relative measure of the importance of each feature. Of course, this
method is entirely empirical and is looking only at each variable indepen­
dently. Neural networks are interesting precisely because they can take inter­
actions between variables into account.
There are variations on this procedure. It is possible to modify the values of
two or three features at the same time to see if combinations of features have a
particular importance. Sometimes, it is useful to start from a location other
than the center of the test set. For instance, the analysis might be repeated for
the minimum and maximum values of the features to see how sensitive the
network is at the extremes. If sensitivity analysis produces significantly differ­
ent results for these three situations, then there are higher order effects in the
network that are taking advantage of combinations of features.
When using a feed-forward, back propagation network, sensitivity analysis
can take advantage of the error measures calculated during the learning phase
instead of having to test each feature independently. The validation set is fed
into the network to produce the output and the output is compared to the
predicted output to calculate the error. The network then propagates the error
back through the units, not to adjust any weights but to keep track of the sen­
sitivity of each input. The error is a proxy for the sensitivity, determining how
much each input affects the output in the network. Accumulating these sensi­
tivities over the entire test set determines which inputs have the larger effect
on the output. In our experience, though, the values produced in this fashion
are not particularly useful for understanding the network.

T I P Neural networks do not produce easily understood rules that explain how
they arrive at a given result. Even so, it is possible to understand the relative
importance of inputs into the network by using sensitivity analysis. Sensitivity
can be a manual process where each feature is tested one at a time relative to
the other features. It can also be more automated by using the sensitivity
information generated by back propagation. In many situations, understanding
the relative importance of inputs is almost as good as having explicit rules.
Artificial Neural Networks 249

Self-Organizing Maps

Self-organizing maps (SOMs) are a variant of neural networks used for undirected
data mining tasks such as cluster detection. The Finnish researcher Dr. Tuevo
Kohonen invented self-organizing maps, which are also called Kohonen Net­
works. Although used originally for images and sounds, these networks can also
recognize clusters in data. They are based on the same underlying units as feed-
forward, back propagation networks, but SOMs are quite different in two respects.
They have a different topology and the back propagation method of learning is
no longer applicable. They have an entirely different method for training.

What Is a Self-Organizing Map?
The self-organizing map (SOM), an example of which is shown in Figure 7.13, is
a neural network that can recognize unknown patterns in the data. Like the
networks we™ve already looked at, the basic SOM has an input layer and an
output layer. Each unit in the input layer is connected to one source, just as in
the networks for predictive modeling. Also, like those networks, each unit in
the SOM has an independent weight associated with each incoming connec­
tion (this is actually a property of all neural networks). However, the similar­
ity between SOMs and feed-forward, back propagation networks ends here.
The output layer consists of many units instead of just a handful. Each of the
units in the output layer is connected to all of the units in the input layer. The
output layer is arranged in a grid, as if the units were in the squares on a
checkerboard. Even though the units are not connected to each other in this
layer, the grid-like structure plays an important role in the training of the
SOM, as we will see shortly.
How does an SOM recognize patterns? Imagine one of the booths at a carni­
val where you throw balls at a wall filled with holes. If the ball lands in one of
the holes, then you have your choice of prizes. Training an SOM is like being
at the booth blindfolded and initially the wall has no holes, very similar to the
situation when you start looking for patterns in large amounts of data and
don™t know where to start. Each time you throw the ball, it dents the wall a lit­
tle bit. Eventually, when enough balls land in the same vicinity, the indentation
breaks through the wall, forming a hole. Now, when another ball lands at that
location, it goes through the hole. You get a prize”at the carnival, this is a
cheap stuffed animal, with an SOM, it is an identifiable cluster.
Figure 7.14 shows how this works for a simple SOM. When a member of the
training set is presented to the network, the values flow forward through the
network to the units in the output layer. The units in the output layer compete
with each other, and the one with the highest value “wins.” The reward is to
adjust the weights leading up to the winning unit to strengthen in the response
to the input pattern. This is like making a little dent in the network.
250 Chapter 7


. 51
( 137 .)