. 45
( 137 .)


pointing as a general method for solving problems.
One reason for the limited usefulness of early neural networks is that most
powerful computers of that era were less powerful than inexpensive desktop
computers today. Another reason was that these simple networks had theoreti­
cal deficiencies, as shown by Seymour Papert and Marvin Minsky (two profes­
sors at the Massachusetts Institute of Technology) in 1968. Because of these
deficiencies, the study of neural network implementations on computers
slowed down drastically during the 1970s. Then, in 1982, John Hopfield of the
California Institute of Technology invented back propagation, a way of training
neural networks that sidestepped the theoretical pitfalls of earlier approaches.

Artificial Neural Networks 213

This development sparked a renaissance in neural network research. Through
the 1980s, research moved from the labs into the commercial world, where it
has since been applied to solve both operational problems”such as detecting
fraudulent credit card transactions as they occur and recognizing numeric
amounts written on checks”and data mining challenges.
At the same time that researchers in artificial intelligence were developing
neural networks as a model of biological activity, statisticians were taking
advantage of computers to extend the capabilities of statistical methods. A
technique called logistic regression proved particularly valuable for many
types of statistical analysis. Like linear regression, logistic regression tries to fit
a curve to observed data. Instead of a line, though, it uses a function called the
logistic function. Logistic regression, and even its more familiar cousin linear
regression, can be represented as special cases of neural networks. In fact, the
entire theory of neural networks can be explained using statistical methods,
such as probability distributions, likelihoods, and so on. For expository pur­
poses, though, this chapter leans more heavily toward the biological model
than toward theoretical statistics.
Neural networks became popular in the 1980s because of a convergence of
several factors. First, computing power was readily available, especially in the
business community where data was available. Second, analysts became more
comfortable with neural networks by realizing that they are closely related to
known statistical methods. Third, there was relevant data since operational
systems in most companies had already been automated. Fourth, useful appli­
cations became more important than the holy grails of artificial intelligence.
Building tools to help people superseded the goal of building artificial people.
Because of their proven utility, neural networks are, and will continue to be,
popular tools for data mining.

Real Estate Appraisal
Neural networks have the ability to learn by example in much the same way
that human experts gain from experience. The following example applies
neural networks to solve a problem familiar to most readers”real estate
Why would we want to automate appraisals? Clearly, automated appraisals
could help real estate agents better match prospective buyers to prospective
homes, improving the productivity of even inexperienced agents. Another use
would be to set up kiosks or Web pages where prospective buyers could
describe the homes that they wanted”and get immediate feedback on how
much their dream homes cost.
Perhaps an unexpected application is in the secondary mortgage market.
Good, consistent appraisals are critical to assessing the risk of individual loans
and loan portfolios, because one major factor affecting default is the proportion
214 Chapter 7

of the value of the property at risk. If the loan value is more than 100 percent of
the market value, the risk of default goes up considerably. Once the loan has
been made, how can the market value be calculated? For this purpose, Freddie
Mac, the Federal Home Loan Mortgage Corporation, developed a product
called Loan Prospector that does these appraisals automatically for homes
throughout the United States. Loan Prospector was originally based on neural
network technology developed by a San Diego company HNC, which has since
been merged into Fair Isaac.
Back to the example. This neural network mimics an appraiser who
estimates the market value of a house based on features of the property (see
Figure 7.1). She knows that houses in one part of town are worth more than
those in other areas. Additional bedrooms, a larger garage, the style of the
house, and the size of the lot are other factors that figure into her mental cal-
culation. She is not applying some set formula, but balancing her experience
and knowledge of the sales prices of similar homes. And, her knowledge about
housing prices is not static. She is aware of recent sale prices for homes
throughout the region and can recognize trends in prices over time”fine-
tuning her calculation to fit the latest data.



Figure 7.1 Real estate agents and appraisers combine the features of a house to come up
with a valuation”an example of biological neural networks at work.
Artificial Neural Networks 215

The appraiser or real estate agent is a good example of a human expert in a well-
defined domain. Houses are described by a fixed set of standard features taken
into account by the expert and turned into an appraised value. In 1992, researchers
at IBM recognized this as a good problem for neural networks. Figure 7.2 illus­
trates why. A neural network takes specific inputs”in this case the information
from the housing sheet”and turns them into a specific output, an appraised value
for the house. The list of inputs is well defined because of two factors: extensive
use of the multiple listing service (MLS) to share information about the housing
market among different real estate agents and standardization of housing descrip­
tions for mortgages sold on secondary markets. The desired output is well defined
as well”a specific dollar amount. In addition, there is a wealth of experience in
the form of previous sales for teaching the network how to value a house.

T I P Neural networks are good for prediction and estimation problems. A
good problem has the following three characteristics:

The inputs are well understood. You have a good idea of which features

of the data are important, but not necessarily how to combine them.

The output is well understood. You know what you are trying to model.

Experience is available. You have plenty of examples where both the

inputs and the output are known. These known cases are used to train
the network.

The first step in setting up a neural network to calculate estimated housing
values is determining a set of features that affect the sales price. Some possible
common features are shown in Table 7.1. In practice, these features work for
homes in a single geographical area. To extend the appraisal example to han­
dle homes in many neighborhoods, the input data would include zip code
information, neighborhood demographics, and other neighborhood quality-
of-life indicators, such as ratings of schools and proximity to transportation. To
simplify the example, these additional features are not included here.

inputs output
living sp

size of garage
Neural Network Model appraised value
age of house

etc. etc. e

Figure 7.2 A neural network is like a black box that knows how to process inputs to create
an output. The calculation is quite complex and difficult to understand, yet the results are
often useful.
216 Chapter 7

Table 7.1 Common Features Describing a House


Num_Apartments Number of dwelling units Integer: 1“3

Year_Built Year built Integer: 1850“1986

Plumbing_Fixtures Number of plumbing fixtures Integer: 5“17

Heating_Type Heating system type coded as A or B

Basement_Garage Basement garage (number of cars) Integer: 0“2

Attached_Garage Attached frame garage area Integer: 0“228
(in square feet)

Living_Area Total living area (square feet) Integer: 714“4185

Deck_Area Deck / open porch area (square feet) Integer: 0“738

Porch_Area Enclosed porch area (square feet) Integer: 0“452

Recroom_Area Recreation room area (square feet) Integer: 0“672

Basement_Area Finished basement area (square feet) Integer: 0“810

Training the network builds a model which can then be used to estimate the
target value for unknown examples. Training presents known examples (data
from previous sales) to the network so that it can learn how to calculate the
sales price. The training examples need two more additional features: the sales
price of the home and the sales date. The sales price is needed as the target
variable. The date is used to separate the examples into a training, validation,
and test set. Table 7.2 shows an example from the training set.
The process of training the network is actually the process of adjusting
weights inside it to arrive at the best combination of weights for making the
desired predictions. The network starts with a random set of weights, so it ini­
tially performs very poorly. However, by reprocessing the training set over
and over and adjusting the internal weights each time to reduce the overall
error, the network gradually does a better and better job of approximating the
target values in the training set. When the appoximations no longer improve,
the network stops training.
Artificial Neural Networks 217

Table 7.2 Sample Record from Training Set with Values Scaled to Range “1 to 1


Sales_Price $103,000“$250,000 $171,000 “0.0748

Months_Ago 0“23 4 “0.6522

Num_Apartments 1-3 1 “1.0000

Year_Built 1850“1986 1923 +0.0730

Plumbing_Fixtures 5“17 9 “0.3077

Heating_Type coded as A or B B +1.0000

Basement_Garage 0“2 0 “1.0000

Attached_Garage 0“228 120 +0.0524

Living_Area 714“4185 1,614 “0.4813

Deck_Area 0“738 0 “1.0000

Porch_Area 0“452 210 “0.0706

Recroom_Area 0“672 0 “1.0000

Basement_Area 0“810 175 “0.5672

This process of adjusting weights is sensitive to the representation of the
data going in. For instance, consider a field in the data that measures lot size.
If lot size is measured in acres, then the values might reasonably go from about
„8 to 1 acre. If measured in square feet, the same values would be 5,445 square
feet to 43,560 square feet. However, for technical reasons, neural networks
restrict their inputs to small numbers, say between “1 and 1. For instance,
when an input variable takes on very large values relative to other inputs, then
this variable dominates the calculation of the target. The neural network
wastes valuable iterations by reducing the weights on this input to lessen its
effect on the output. That is, the first “pattern” that the network will find is
that the lot size variable has much larger values than other variables. Since this
is not particularly interesting, it would be better to use the lot size as measured
in acres rather than square feet.
This idea generalizes. Usually, the inputs in the neural network should be
smallish numbers. It is a good idea to limit them to some small range, such as
“1 to 1, which requires mapping all the values, both continuous and categorical
prior to training the network.
One way to map continuous values is to turn them into fractions by sub­
tracting the middle value of the range from the value, dividing the result by the
size of the range, and multiplying by 2. For instance, to get a mapped value for
218 Chapter 7

Year_Built (1923), subtract (1850 + 1986)/2 = 1918 (the middle value) from 1923
(the year the oldest house was built) and get 7. Dividing by the number of years
in the range (1986 “ 1850 + 1 = 137) yields a scaled value and multiplying by 2
yields a value of 0.0730. This basic procedure can be applied to any continuous
feature to get a value between “1 and 1. One way to map categorical features is
to assign fractions between “1 and 1 to each of the categories. The only categor­
ical variable in this data is Heating_Type, so we can arbitrarily map B 1 and A to
“1. If we had three values, we could assign one to “1, another to 0, and the third
to 1, although this approach does have the drawback that the three heating
types will seem to have an order. Type “1 will appear closer to type 0 than to
type 1. Chapter 17 contains further discussion of ways to convert categorical
variables to numeric variables without adding spurious information.


. 45
( 137 .)