One reason for the limited usefulness of early neural networks is that most

powerful computers of that era were less powerful than inexpensive desktop

computers today. Another reason was that these simple networks had theoreti

cal deficiencies, as shown by Seymour Papert and Marvin Minsky (two profes

sors at the Massachusetts Institute of Technology) in 1968. Because of these

deficiencies, the study of neural network implementations on computers

slowed down drastically during the 1970s. Then, in 1982, John Hopfield of the

California Institute of Technology invented back propagation, a way of training

neural networks that sidestepped the theoretical pitfalls of earlier approaches.

Team-Fly®

Artificial Neural Networks 213

This development sparked a renaissance in neural network research. Through

the 1980s, research moved from the labs into the commercial world, where it

has since been applied to solve both operational problems”such as detecting

fraudulent credit card transactions as they occur and recognizing numeric

amounts written on checks”and data mining challenges.

At the same time that researchers in artificial intelligence were developing

neural networks as a model of biological activity, statisticians were taking

advantage of computers to extend the capabilities of statistical methods. A

technique called logistic regression proved particularly valuable for many

types of statistical analysis. Like linear regression, logistic regression tries to fit

a curve to observed data. Instead of a line, though, it uses a function called the

logistic function. Logistic regression, and even its more familiar cousin linear

regression, can be represented as special cases of neural networks. In fact, the

entire theory of neural networks can be explained using statistical methods,

such as probability distributions, likelihoods, and so on. For expository pur

poses, though, this chapter leans more heavily toward the biological model

than toward theoretical statistics.

Neural networks became popular in the 1980s because of a convergence of

several factors. First, computing power was readily available, especially in the

business community where data was available. Second, analysts became more

comfortable with neural networks by realizing that they are closely related to

known statistical methods. Third, there was relevant data since operational

systems in most companies had already been automated. Fourth, useful appli

cations became more important than the holy grails of artificial intelligence.

Building tools to help people superseded the goal of building artificial people.

Because of their proven utility, neural networks are, and will continue to be,

popular tools for data mining.

Real Estate Appraisal

Neural networks have the ability to learn by example in much the same way

that human experts gain from experience. The following example applies

neural networks to solve a problem familiar to most readers”real estate

appraisal.

Why would we want to automate appraisals? Clearly, automated appraisals

could help real estate agents better match prospective buyers to prospective

homes, improving the productivity of even inexperienced agents. Another use

would be to set up kiosks or Web pages where prospective buyers could

describe the homes that they wanted”and get immediate feedback on how

much their dream homes cost.

Perhaps an unexpected application is in the secondary mortgage market.

Good, consistent appraisals are critical to assessing the risk of individual loans

and loan portfolios, because one major factor affecting default is the proportion

214 Chapter 7

of the value of the property at risk. If the loan value is more than 100 percent of

the market value, the risk of default goes up considerably. Once the loan has

been made, how can the market value be calculated? For this purpose, Freddie

Mac, the Federal Home Loan Mortgage Corporation, developed a product

called Loan Prospector that does these appraisals automatically for homes

throughout the United States. Loan Prospector was originally based on neural

network technology developed by a San Diego company HNC, which has since

been merged into Fair Isaac.

Back to the example. This neural network mimics an appraiser who

estimates the market value of a house based on features of the property (see

Figure 7.1). She knows that houses in one part of town are worth more than

those in other areas. Additional bedrooms, a larger garage, the style of the

house, and the size of the lot are other factors that figure into her mental cal-

culation. She is not applying some set formula, but balancing her experience

and knowledge of the sales prices of similar homes. And, her knowledge about

housing prices is not static. She is aware of recent sale prices for homes

throughout the region and can recognize trends in prices over time”fine-

tuning her calculation to fit the latest data.

?

?

$$$

?

Figure 7.1 Real estate agents and appraisers combine the features of a house to come up

with a valuation”an example of biological neural networks at work.

Artificial Neural Networks 215

The appraiser or real estate agent is a good example of a human expert in a well-

defined domain. Houses are described by a fixed set of standard features taken

into account by the expert and turned into an appraised value. In 1992, researchers

at IBM recognized this as a good problem for neural networks. Figure 7.2 illus

trates why. A neural network takes specific inputs”in this case the information

from the housing sheet”and turns them into a specific output, an appraised value

for the house. The list of inputs is well defined because of two factors: extensive

use of the multiple listing service (MLS) to share information about the housing

market among different real estate agents and standardization of housing descrip

tions for mortgages sold on secondary markets. The desired output is well defined

as well”a specific dollar amount. In addition, there is a wealth of experience in

the form of previous sales for teaching the network how to value a house.

T I P Neural networks are good for prediction and estimation problems. A

good problem has the following three characteristics:

The inputs are well understood. You have a good idea of which features

–

–

of the data are important, but not necessarily how to combine them.

The output is well understood. You know what you are trying to model.

––

Experience is available. You have plenty of examples where both the

––

inputs and the output are known. These known cases are used to train

the network.

The first step in setting up a neural network to calculate estimated housing

values is determining a set of features that affect the sales price. Some possible

common features are shown in Table 7.1. In practice, these features work for

homes in a single geographical area. To extend the appraisal example to han

dle homes in many neighborhoods, the input data would include zip code

information, neighborhood demographics, and other neighborhood quality-

of-life indicators, such as ratings of schools and proximity to transportation. To

simplify the example, these additional features are not included here.

inputs output

living sp

ace

size of garage

Neural Network Model appraised value

age of house

tc.

etc. etc. e

Figure 7.2 A neural network is like a black box that knows how to process inputs to create

an output. The calculation is quite complex and difficult to understand, yet the results are

often useful.

216 Chapter 7

Table 7.1 Common Features Describing a House

FEATURE DESCRIPTION RANGE OF VALUES

Num_Apartments Number of dwelling units Integer: 1“3

Year_Built Year built Integer: 1850“1986

Plumbing_Fixtures Number of plumbing fixtures Integer: 5“17

Heating_Type Heating system type coded as A or B

Basement_Garage Basement garage (number of cars) Integer: 0“2

Attached_Garage Attached frame garage area Integer: 0“228

(in square feet)

Living_Area Total living area (square feet) Integer: 714“4185

Deck_Area Deck / open porch area (square feet) Integer: 0“738

Porch_Area Enclosed porch area (square feet) Integer: 0“452

Recroom_Area Recreation room area (square feet) Integer: 0“672

Basement_Area Finished basement area (square feet) Integer: 0“810

Training the network builds a model which can then be used to estimate the

target value for unknown examples. Training presents known examples (data

from previous sales) to the network so that it can learn how to calculate the

sales price. The training examples need two more additional features: the sales

price of the home and the sales date. The sales price is needed as the target

variable. The date is used to separate the examples into a training, validation,

and test set. Table 7.2 shows an example from the training set.

The process of training the network is actually the process of adjusting

weights inside it to arrive at the best combination of weights for making the

desired predictions. The network starts with a random set of weights, so it ini

tially performs very poorly. However, by reprocessing the training set over

and over and adjusting the internal weights each time to reduce the overall

error, the network gradually does a better and better job of approximating the

target values in the training set. When the appoximations no longer improve,

the network stops training.

Artificial Neural Networks 217

Table 7.2 Sample Record from Training Set with Values Scaled to Range “1 to 1

RANGE OF ORIGINAL SCALED

FEATURE VALUES VALUE VALUE

Sales_Price $103,000“$250,000 $171,000 “0.0748

Months_Ago 0“23 4 “0.6522

Num_Apartments 1-3 1 “1.0000

Year_Built 1850“1986 1923 +0.0730

Plumbing_Fixtures 5“17 9 “0.3077

Heating_Type coded as A or B B +1.0000

Basement_Garage 0“2 0 “1.0000

Attached_Garage 0“228 120 +0.0524

Living_Area 714“4185 1,614 “0.4813

Deck_Area 0“738 0 “1.0000

Porch_Area 0“452 210 “0.0706

Recroom_Area 0“672 0 “1.0000

Basement_Area 0“810 175 “0.5672

This process of adjusting weights is sensitive to the representation of the

data going in. For instance, consider a field in the data that measures lot size.

If lot size is measured in acres, then the values might reasonably go from about

1

„8 to 1 acre. If measured in square feet, the same values would be 5,445 square

feet to 43,560 square feet. However, for technical reasons, neural networks

restrict their inputs to small numbers, say between “1 and 1. For instance,

when an input variable takes on very large values relative to other inputs, then

this variable dominates the calculation of the target. The neural network

wastes valuable iterations by reducing the weights on this input to lessen its

effect on the output. That is, the first “pattern” that the network will find is

that the lot size variable has much larger values than other variables. Since this

is not particularly interesting, it would be better to use the lot size as measured

in acres rather than square feet.

This idea generalizes. Usually, the inputs in the neural network should be

smallish numbers. It is a good idea to limit them to some small range, such as

“1 to 1, which requires mapping all the values, both continuous and categorical

prior to training the network.

One way to map continuous values is to turn them into fractions by sub

tracting the middle value of the range from the value, dividing the result by the

size of the range, and multiplying by 2. For instance, to get a mapped value for

218 Chapter 7

Year_Built (1923), subtract (1850 + 1986)/2 = 1918 (the middle value) from 1923

(the year the oldest house was built) and get 7. Dividing by the number of years

in the range (1986 “ 1850 + 1 = 137) yields a scaled value and multiplying by 2

yields a value of 0.0730. This basic procedure can be applied to any continuous

feature to get a value between “1 and 1. One way to map categorical features is

to assign fractions between “1 and 1 to each of the categories. The only categor

ical variable in this data is Heating_Type, so we can arbitrarily map B 1 and A to

“1. If we had three values, we could assign one to “1, another to 0, and the third

to 1, although this approach does have the drawback that the three heating

types will seem to have an order. Type “1 will appear closer to type 0 than to

type 1. Chapter 17 contains further discussion of ways to convert categorical

variables to numeric variables without adding spurious information.