in this context would imply negative ¬xed costs, which makes no sense

whatsoever (although in regressions involving other variables it may well

make sense). Normally one should not use a result like that, despite oth-

erwise impressive regression statistics.

If the regression forecasts variable costs above $1.00, one should be

suspicious. If true, either the Company must anticipate a signi¬cant de-

crease in its cost structure in the near future”which would invalidate

applicability of the regression analysis to the future”or the Company

will be out of business soon. The analyst should also consider the pos-

sibility that the regression failed, perhaps because of either insuf¬cient or

incorrect data, and it may be unwise to use the results in the valuation.

PART 1 Forecasting Cash Flows

28

USE OF REGRESSION STATISTICS TO TEST THE

ROBUSTNESS OF THE RELATIONSHIP

Having determined the equation of the line, we use regression statistics

to determine the strength of the relationship between the dependent and

independent variable(s). We give only a brief verbal description of re-

gression statistics below. For a more in-depth explanation, the reader

should refer to a book on statistics.

In an OLS regression, the ˜˜goodness of ¬t™™ of the line is measured

by the degree of correlation between the dependent and independent

variable, referred to as the r value. An r value of 1 indicates a perfect

direct relationship, where the independent variable explains all of the

variation of the dependent variable. A value of 1 indicates a perfect

inverse relationship. Most r values fall between 1 and 1, but the closer

to 1 (or 1), the better the relationship. An r value of zero indicates no

relationship between the variables.

In a multivariable regression equation, the multiple R measures how

well the dependent variable is correlated to all of the independent vari-

ables in the regression equation. Multiple R measures the total amount

of variation in the dependent variable that is explained by the indepen-

dent variables. In our case, the value of 99.88% (B20) is very close to 1,

indicating that almost all of the variation in adjusted costs is explained

by sales.6

The square of the single or multiple R value, referred to as R-square

(or R 2), measures the percentage of the variation in the dependent vari-

able explained by the independent variable. It is the main measure of the

goodness of ¬t. We obtain an R 2 of 99.75% (B21), which means that sales

explains 99.75% of the variation in adjusted costs.

Adding more independent variables to the regression equation usu-

ally adds to R 2, even when there is no true causality. In statistics, this is

called ˜˜spurious correlation.™™ The adjusted R2, which is 99.72% in our

example (B22), removes the expected spurious correlation in the ˜˜gross™™

R2.

k n 1

Adj R 2 R2

n 1 n k 1

where n is the number of observations and k is the number of indepen-

dent variables (also known as regressors).

Although the data in Table 2-1A are ¬ctitious, in practice I have

found that regressions of adjusted costs versus sales usually give rise to

R 2 values of 98% or better.7

Standard Error of the y-Estimate

The standard error of the y-estimate is another important regression sta-

tistic that gives us information about the reliability of the regression es-

6. Although the spreadsheet labels this statistic Multiple R, because our example is an OLS

regression, it is simply R.

7. This obviously does not apply to start-ups.

CHAPTER 2 Using Regression Analysis 29

timate. We can multiply the standard error of $16,014 (B23) by two to

calculate an approximate 95% con¬dence interval for the regression es-

timate. Thus, we are 95% sure that the true adjusted costs are within

$32,028 of the regression estimate of total adjusted costs.8 Dividing

$64,000 by the mean of adjusted costs (approximately $1 million) leads

to a 95% con¬dence interval that varies by about 3%, or 6% total. Later

in the chapter we will calculate precise con¬dence intervals.

The Mean of a and b

Because a and b are speci¬c numbers that we calculate in a regression

analysis, it is easy to lose sight of the fact that they are not simply num-

bers, but rather random variables. Remember that we are trying to esti-

mate and , the true ¬xed and variable cost, which we will never know.

If we had 20 years of ¬nancial history for our Subject Company, we could

take any number of combinations of years for our regression analysis.

Suppose we had data for 1978“1997. We could use only the last ¬ve years,

1993“1997, or choose 1992“1995 and 1997, still keeping ¬ve years of data,

but excluding 1996”although there is no good reason to do so. We could

use 5, 6, 7, or more years of data. There are a large number of different

samples we can draw out of 20 years of data. Each different sample would

lead to a different calculation of a and b in our attempt to estimate and

, which is why a and b are random variables. Of course, we will never

be exactly correct in our estimate, and even if we were, there would be

no way to know it!

Equations (2-1) and (2-2) state that a and b are unbiased estimators

of and , which means that their expected values equal and . The

capital E is the expected value operator.

E (a) the mean of a is alpha (2-1)

E (b) the mean of b is beta (2-2)

The Variance of a and b

We want to do everything we can to minimize the variances of a and b

in order to improve their reliability as estimators of and . If their

variances are high, we cannot place much reliability on our regression

estimate of costs”something we would like to avoid.

Equations (2-3) and (2-4) below for the variance of a and b give us

important insights into deciding how many years of ¬nancial data to

gather and analyze. Common practice is that an appraisal should encom-

pass ¬ve years of data. Most appraisers consider anything older than ¬ve

years to be stale data, and anything less than ¬ve years insuf¬cient. You

will see that the common practice may be wrong.

The mathematical de¬nition for the variance of a is:

8. This is true at the sample mean of X, and the con¬dence interval widens as we move away

from that.

PART 1 Forecasting Cash Flows

30

2

Var (a) (2-3)

n

where 2 is the true and unobservable population variance around the

true regression line and n number of observations.9 Therefore, the var-

iance of our estimate of ¬xed costs decreases with n, the number of years

10, the variance of our estimate of is 1„2 of its variance

of data. If n

if we use a sample of ¬ve years of data. The standard deviation of a,

which is the square root of its variance, decreases somewhat less dra-

matically than the variance, but signi¬cantly nonetheless. Having 10 years

of data reduces the standard deviation of our estimate of ¬xed costs by

29% vis-a-vis ¬ve years of data. Thus, having more years of data may

`

increase the reliability of our statistical estimate of ¬xed costs if the data

are not ˜˜stale,™™ that is, out of date due to changes in the business, all else

being constant.

The variance of b is equal to the population variance divided by the

sum of the squared deviations from the mean of the independent variable,

or:

2

Var (b) (2-4)

n

x2

i

i1

where xi Xi X, the deviation of the independent variable of each

observation, Xi, from the mean, X, of all its observations. In this context,

it is each year™s sales minus the average of sales in the period of analysis.

Since we have no control over the numerator”indeed, we cannot even

know it”the denominator is the only portion where we can affect the

variance of b. Let™s take a further look at the denominator.

Table 2-2 is a simple example to illustrate the meaning of x versus

X. Expenses (Column C) is our Y (dependent) variable, and sales (Column

T A B L E 2-2

OLS Regression: Example of Deviation from Mean

A B C D E F

5 Variable

x2

6 Y X x

7 Deviation Squared Dev.

8 Observation Year Expenses Sales From Mean From Mean

9 1 1994 $ 80,000 $100,000 $(66,667) 4,444,444,444

10 2 1996 $115,000 $150,000 $(16,667) 277,777,778

11 3 1997 $195,000 $250,000 $ 83,333 6,9444,444,444

12 Total $500,000 $ - 11,666,666,667

13 Average $166,667

9. Technically this is true only when the y-axis is placed through the mean of x. The following

arguments are valid, however, in either case.

CHAPTER 2 Using Regression Analysis 31

D) is our X (independent) variable. The three years sales total $500,000

(cell D12), which averages to $166,667 (D13) per year, which is X. Column

E shows x, the deviation of each X observation from the sample mean,

X, of $166,667. In 1995, x1 $100,000 $166,667 $66,667. In 1996, x2

$150,000 $166,667 $16,667. Finally in 1997, x3 $250,000

$166,667 $83,333. The sum of all deviations is always zero, or

3

xi 0

i1

Finally, Column F shows x 2, the square of Column E. The sum of the

squared deviations,

3

x2 $11,666,666,667.

i

i1

This squared term appears in several OLS formulas and is particularly

important in calculating the variance of b.

When we use relatively fewer years of data, there tends to be less

variation in sales. If sales are con¬ned to a fairly narrow range, the

squared deviations in the denominator are relatively small, which makes

the variance of b large. The opposite is true when we use more years of

data. A countervailing consideration is that using more years of data may

lead to a higher sample variance, which is the regression estimate of 2.

Thus, it is dif¬cult to say in advance how many years of data are optimal.

This means that the common practice in the industry of using only

¬ve years of data so as not to corrupt our analysis with stale data may

be incorrect if there are no signi¬cant structural changes in the competi-

tive environment. The number of years of available data that gives the

best overall statistical output for the regression equation is the most de-

sirable. Ideally, the analyst should experiment with different numbers of

years of data and let the regression statistics”the adjusted R 2, t-statistics,

and standard error of the y-estimate”provide the feedback to making

the optimal choice of how many years of data to use.

Sometimes prior data can truly be stale. For example, if the number

of competitors in the Company™s geographic area doubled, this would

tend to drive down prices relative to costs, resulting in a decreased con-

tribution margin and an increase in variable costs per dollar of sales. In

this case, using the old data without adjustment would distort the re-

gression results. Nevertheless, it may be advisable in some circumstances

to use some of the old data”with adjustments”in order to have enough

data points for analysis. In the example of more competition in later years,

it is possible to reduce the sales in the years prior to the competitive

change on a pro forma basis, keeping the costs the same. The regression

on this adjusted data is often likely to be more accurate than ˜˜winging

it™™ with only two or three years of fresh data.

Of course, the company™s management has its view of the future. It

is important for the appraiser to understand that view and consider it in

his or her statistical work.

PART 1 Forecasting Cash Flows

32

Con¬dence Intervals

Constructing con¬dence intervals around the regression estimates a and

b is another important step in using regression analysis. We would like

to be able to make a statement that we are 95% sure that the true variable

(either or ) is within a speci¬c range of numbers, with our regression

estimate (a or b) at the midpoint. To calculate the range, we must use the

Student™s t-distribution, which we de¬ne in equation (2-6).

We begin with a standardized normal (Z) distribution. A standard-

ized normal distribution of b”our estimate of ”is constructed by sub-

tracting the mean of b, which is , and dividing by its standard deviation.

b

Z (2-5)

x2

/ i

i

Since we do not know , the population standard deviation, the best

we can do is estimate it with s, the sample standard deviation. The result

is the Student™s t-distribution, or simply the t-distribution. Figure 2-1