. 10
( 100 .)


equation should make common sense. For example, a negative y-intercept
in this context would imply negative ¬xed costs, which makes no sense
whatsoever (although in regressions involving other variables it may well
make sense). Normally one should not use a result like that, despite oth-
erwise impressive regression statistics.
If the regression forecasts variable costs above $1.00, one should be
suspicious. If true, either the Company must anticipate a signi¬cant de-
crease in its cost structure in the near future”which would invalidate
applicability of the regression analysis to the future”or the Company
will be out of business soon. The analyst should also consider the pos-
sibility that the regression failed, perhaps because of either insuf¬cient or
incorrect data, and it may be unwise to use the results in the valuation.

PART 1 Forecasting Cash Flows
Having determined the equation of the line, we use regression statistics
to determine the strength of the relationship between the dependent and
independent variable(s). We give only a brief verbal description of re-
gression statistics below. For a more in-depth explanation, the reader
should refer to a book on statistics.
In an OLS regression, the ˜˜goodness of ¬t™™ of the line is measured
by the degree of correlation between the dependent and independent
variable, referred to as the r value. An r value of 1 indicates a perfect
direct relationship, where the independent variable explains all of the
variation of the dependent variable. A value of 1 indicates a perfect
inverse relationship. Most r values fall between 1 and 1, but the closer
to 1 (or 1), the better the relationship. An r value of zero indicates no
relationship between the variables.
In a multivariable regression equation, the multiple R measures how
well the dependent variable is correlated to all of the independent vari-
ables in the regression equation. Multiple R measures the total amount
of variation in the dependent variable that is explained by the indepen-
dent variables. In our case, the value of 99.88% (B20) is very close to 1,
indicating that almost all of the variation in adjusted costs is explained
by sales.6
The square of the single or multiple R value, referred to as R-square
(or R 2), measures the percentage of the variation in the dependent vari-
able explained by the independent variable. It is the main measure of the
goodness of ¬t. We obtain an R 2 of 99.75% (B21), which means that sales
explains 99.75% of the variation in adjusted costs.
Adding more independent variables to the regression equation usu-
ally adds to R 2, even when there is no true causality. In statistics, this is
called ˜˜spurious correlation.™™ The adjusted R2, which is 99.72% in our
example (B22), removes the expected spurious correlation in the ˜˜gross™™
k n 1
Adj R 2 R2
n 1 n k 1
where n is the number of observations and k is the number of indepen-
dent variables (also known as regressors).
Although the data in Table 2-1A are ¬ctitious, in practice I have
found that regressions of adjusted costs versus sales usually give rise to
R 2 values of 98% or better.7

Standard Error of the y-Estimate
The standard error of the y-estimate is another important regression sta-
tistic that gives us information about the reliability of the regression es-

6. Although the spreadsheet labels this statistic Multiple R, because our example is an OLS
regression, it is simply R.
7. This obviously does not apply to start-ups.

CHAPTER 2 Using Regression Analysis 29
timate. We can multiply the standard error of $16,014 (B23) by two to
calculate an approximate 95% con¬dence interval for the regression es-
timate. Thus, we are 95% sure that the true adjusted costs are within
$32,028 of the regression estimate of total adjusted costs.8 Dividing
$64,000 by the mean of adjusted costs (approximately $1 million) leads
to a 95% con¬dence interval that varies by about 3%, or 6% total. Later
in the chapter we will calculate precise con¬dence intervals.

The Mean of a and b
Because a and b are speci¬c numbers that we calculate in a regression
analysis, it is easy to lose sight of the fact that they are not simply num-
bers, but rather random variables. Remember that we are trying to esti-
mate and , the true ¬xed and variable cost, which we will never know.
If we had 20 years of ¬nancial history for our Subject Company, we could
take any number of combinations of years for our regression analysis.
Suppose we had data for 1978“1997. We could use only the last ¬ve years,
1993“1997, or choose 1992“1995 and 1997, still keeping ¬ve years of data,
but excluding 1996”although there is no good reason to do so. We could
use 5, 6, 7, or more years of data. There are a large number of different
samples we can draw out of 20 years of data. Each different sample would
lead to a different calculation of a and b in our attempt to estimate and
, which is why a and b are random variables. Of course, we will never
be exactly correct in our estimate, and even if we were, there would be
no way to know it!
Equations (2-1) and (2-2) state that a and b are unbiased estimators
of and , which means that their expected values equal and . The
capital E is the expected value operator.

E (a) the mean of a is alpha (2-1)
E (b) the mean of b is beta (2-2)

The Variance of a and b
We want to do everything we can to minimize the variances of a and b
in order to improve their reliability as estimators of and . If their
variances are high, we cannot place much reliability on our regression
estimate of costs”something we would like to avoid.
Equations (2-3) and (2-4) below for the variance of a and b give us
important insights into deciding how many years of ¬nancial data to
gather and analyze. Common practice is that an appraisal should encom-
pass ¬ve years of data. Most appraisers consider anything older than ¬ve
years to be stale data, and anything less than ¬ve years insuf¬cient. You
will see that the common practice may be wrong.
The mathematical de¬nition for the variance of a is:

8. This is true at the sample mean of X, and the con¬dence interval widens as we move away
from that.

PART 1 Forecasting Cash Flows
Var (a) (2-3)
where 2 is the true and unobservable population variance around the
true regression line and n number of observations.9 Therefore, the var-
iance of our estimate of ¬xed costs decreases with n, the number of years
10, the variance of our estimate of is 1„2 of its variance
of data. If n
if we use a sample of ¬ve years of data. The standard deviation of a,
which is the square root of its variance, decreases somewhat less dra-
matically than the variance, but signi¬cantly nonetheless. Having 10 years
of data reduces the standard deviation of our estimate of ¬xed costs by
29% vis-a-vis ¬ve years of data. Thus, having more years of data may
increase the reliability of our statistical estimate of ¬xed costs if the data
are not ˜˜stale,™™ that is, out of date due to changes in the business, all else
being constant.
The variance of b is equal to the population variance divided by the
sum of the squared deviations from the mean of the independent variable,
Var (b) (2-4)

where xi Xi X, the deviation of the independent variable of each
observation, Xi, from the mean, X, of all its observations. In this context,
it is each year™s sales minus the average of sales in the period of analysis.
Since we have no control over the numerator”indeed, we cannot even
know it”the denominator is the only portion where we can affect the
variance of b. Let™s take a further look at the denominator.
Table 2-2 is a simple example to illustrate the meaning of x versus
X. Expenses (Column C) is our Y (dependent) variable, and sales (Column

T A B L E 2-2

OLS Regression: Example of Deviation from Mean


5 Variable

6 Y X x

7 Deviation Squared Dev.
8 Observation Year Expenses Sales From Mean From Mean

9 1 1994 $ 80,000 $100,000 $(66,667) 4,444,444,444
10 2 1996 $115,000 $150,000 $(16,667) 277,777,778
11 3 1997 $195,000 $250,000 $ 83,333 6,9444,444,444
12 Total $500,000 $ - 11,666,666,667
13 Average $166,667

9. Technically this is true only when the y-axis is placed through the mean of x. The following
arguments are valid, however, in either case.

CHAPTER 2 Using Regression Analysis 31
D) is our X (independent) variable. The three years sales total $500,000
(cell D12), which averages to $166,667 (D13) per year, which is X. Column
E shows x, the deviation of each X observation from the sample mean,
X, of $166,667. In 1995, x1 $100,000 $166,667 $66,667. In 1996, x2
$150,000 $166,667 $16,667. Finally in 1997, x3 $250,000
$166,667 $83,333. The sum of all deviations is always zero, or
xi 0

Finally, Column F shows x 2, the square of Column E. The sum of the
squared deviations,
x2 $11,666,666,667.

This squared term appears in several OLS formulas and is particularly
important in calculating the variance of b.
When we use relatively fewer years of data, there tends to be less
variation in sales. If sales are con¬ned to a fairly narrow range, the
squared deviations in the denominator are relatively small, which makes
the variance of b large. The opposite is true when we use more years of
data. A countervailing consideration is that using more years of data may
lead to a higher sample variance, which is the regression estimate of 2.
Thus, it is dif¬cult to say in advance how many years of data are optimal.
This means that the common practice in the industry of using only
¬ve years of data so as not to corrupt our analysis with stale data may
be incorrect if there are no signi¬cant structural changes in the competi-
tive environment. The number of years of available data that gives the
best overall statistical output for the regression equation is the most de-
sirable. Ideally, the analyst should experiment with different numbers of
years of data and let the regression statistics”the adjusted R 2, t-statistics,
and standard error of the y-estimate”provide the feedback to making
the optimal choice of how many years of data to use.
Sometimes prior data can truly be stale. For example, if the number
of competitors in the Company™s geographic area doubled, this would
tend to drive down prices relative to costs, resulting in a decreased con-
tribution margin and an increase in variable costs per dollar of sales. In
this case, using the old data without adjustment would distort the re-
gression results. Nevertheless, it may be advisable in some circumstances
to use some of the old data”with adjustments”in order to have enough
data points for analysis. In the example of more competition in later years,
it is possible to reduce the sales in the years prior to the competitive
change on a pro forma basis, keeping the costs the same. The regression
on this adjusted data is often likely to be more accurate than ˜˜winging
it™™ with only two or three years of fresh data.
Of course, the company™s management has its view of the future. It
is important for the appraiser to understand that view and consider it in
his or her statistical work.

PART 1 Forecasting Cash Flows
Con¬dence Intervals
Constructing con¬dence intervals around the regression estimates a and
b is another important step in using regression analysis. We would like
to be able to make a statement that we are 95% sure that the true variable
(either or ) is within a speci¬c range of numbers, with our regression
estimate (a or b) at the midpoint. To calculate the range, we must use the
Student™s t-distribution, which we de¬ne in equation (2-6).
We begin with a standardized normal (Z) distribution. A standard-
ized normal distribution of b”our estimate of ”is constructed by sub-
tracting the mean of b, which is , and dividing by its standard deviation.
Z (2-5)
/ i

Since we do not know , the population standard deviation, the best
we can do is estimate it with s, the sample standard deviation. The result
is the Student™s t-distribution, or simply the t-distribution. Figure 2-1


. 10
( 100 .)