. 12
( 100 .)


( $739,032) is equal to B6 ($250,000) minus B16 ($989,032). D7
( $489,032) equals B7 ($500,000) minus B16 ($989,032). The total of all
deviations from the mean must always equal zero, which it does (D16).
Column E is the squared deviations, i.e., the square of Column D. In
statistics, the independent variable(s) is known as X, while the deviations
from the mean are known as x, which explains the column labels in B5
and D5. The sum of squared deviations,
x i2
i 1988

1012 (E16).
equals 1.28
The next step is to compute the squared deviations for our sample
forecast year. We assume that forecast sales for 1997 is $1.6 million (B17).
We repeat the coef¬cients from the regression formula from the ¬rst page
of the table in B36 and B37. Applying the regression equation, we would
then forecast expenses at $1,343,928 (C17).
In order to compute a 95% con¬dence interval around the expense
forecast of $1,343,928, we apply equations (2-8) and (2-9). 1998 forecast
sales are $610,968 (D17 B17 B16) above the mean of the historical
period. That is the x0 in (3-8) and (3-9). We square the term to get 3.73
1011 (E17). Then we divide that by the sum of the squared deviations in
1012 (E16) to get 0.2905650 (F17), which we
the historical period 1.28
repeat below in Row 28.
In Row 25, we insert the t-statistic of 2.306, which one can ¬nd in a
table for a 95% con¬dence level (the 0.025 column in a two-tailed distri-
bution) and eight degrees of freedom (n 10 observations 1 indepen-
dent variable 1). In Row 26 we show the standard error of the y-
estimate of $16,014, which came from Table 2-1B, B23. Row 27 is 1/n
1/10 0.1, where n is the number of observations.
Row 28 is a repetition of F17, the ratio of the squared deviation of
the forecast to the sum of the squared deviations of the independent
variables from their mean.
In B29 we add zero, and in C29 we add 1, according to equations
(2-8) and (2-9), respectively. We will explain the difference in the two
formulas shortly.
In Row 30 we add Rows 27 to 29, which are the terms in the square
root sign in the equations. Obviously, C30 B30 1. In Row 31 we take
the square root of Row 30.
Finally, we are able to calculate our 95% con¬dence intervals as Row
25 Row 26 Row 31. The 95% con¬dence interval for the mean is
2.306 $16,014 0.6249520 $23,078 (B32), approximately 1.44 times
the size of the standard error of the y-estimate. The 95% con¬dence in-

PART 1 Forecasting Cash Flows
terval for the speci¬c year™s cost forecast is $43,547 (C32), approximately
2.72 times the size of the standard error of the y-estimate. The 95% con-
¬dence intervals are 1.7% (B33) and 3.2% (C33) of the forecast costs for
the mean and the speci¬c year™s forecast, respectively.
You can see that both the calculation of 95% con¬dence interval for
the mean and the speci¬c year™s forecast cost is roughly two times the
standard error of the y-estimate. Statisticians often loosely approximate
the 95% con¬dence intervals as two standard errors below and above the
regression estimate. Equations (2-8) and (2-9) are more precise.
Now we will discuss the difference between equations (2-8) and
(2-9). We forecast sales to be $1.6 million in 1998, which means that our
forecast of adjusted costs for that year according to the regression equa-
tion is $1,343,928. Of course, the actual expenses will not equal that num-
ber, even if actual sales by some miracle will equal forecast sales. The
95% con¬dence interval for the mean tells us that if we add and subtract
$23,078 to our forecast of $1,343,928, then we are 95% sure that the true
regression line at sales of $1.6 million should have been between
$1,320,850 and $1,367,006. If we would experience sales of $1.6 million
many times”say 1,000 times”we would be 95% sure that the average
cost would fall in our con¬dence interval.13 Equation (2-8) is the equation
describing this con¬dence interval.
That does not mean that we are 95% sure that costs would be be-
tween $1,320,850 and $1,367,006 in any particular year when sales is $1.6
million. We need a wider con¬dence interval to be 95% sure of costs in
a particular year, given a particular level of sales. Equation (2-9) describes
the con¬dence interval for a particular year.
Thus, the $23,078 con¬dence interval”meaning that we add and
subtract that number from forecast costs”appropriately quanti¬es our
long-run expectation of the con¬dence interval around forecast costs,
given the level of sales. In business valuation we are not very concerned
that every individual year conform to our forecasts. Rather, we are con-
cerned with the long-run accuracy of the regression equation. Thus, equa-
tion (2-8) is the relevant equation for 95% con¬dence intervals for valu-
ation analysts. Remember that the con¬dence interval expands the further
we move away from the mean of the historical period. Therefore, if we
forecast the costs to go with a forecast sales of, say, $5 million in the year
2005, the con¬dence interval around the cost estimate is wider than the
1.7% (B33) around 1998 forecast.

Selecting the Data Set and Regression Equation
Table 2-4 is otherwise identical to Table 2-1B, except that instead of all 10
years of data, it only contains the last 5 years. The regression equation
for the 5 years of data is (Table 2-4, B27 and B28)
Adjusted costs $71,252 ($0.79 Sales)
Examining the regression statistics, we ¬nd that the adjusted R 2 is

13. This ignores the need to recompute the regression equation with new data.

CHAPTER 2 Using Regression Analysis 39
T A B L E 2-4

Regression Analysis 1993“1997


4 Year Sales Adjusted Costs

5 1993 $1,123,600 $965,043
6 1994 $1,191,016 $1,012,745
7 1995 $1,262,477 $1,072,633
8 1996 $1,338,226 $1,122,714
9 1997 $1,415,000 $1,199,000


13 Regression Statistics

14 Multiple R 99.79%
15 R square 99.58%
16 Adjusted R square 99.44%
17 Standard error 6,840
18 Observations 5


21 df SS MS F Signi¬cance F

22 Regression 1 3.35E 10 3.35E 10 716 1.15E 04
23 Residual 3 1.40E 08 4.68E 07
24 Total 4 3.36E 10

26 Coef¬cients Standard Error t Stat P-value Lower 95% Upper 95%

27 Intercept [1] 71,252 37,624 1.89 0.15 (48,485) 190,989
28 Sales [2] 0.79 0.03 26.75 0.00 0.70 0.89

Regression Plot
[1] This is the regression estimate of ¬xed costs $1,250,000
[2] This is the regression estimate of variable costs

$1,200,000 y = 0.7924x + 71252
R2 = 0.9958



Adj. Costs





$1,100,000 $1,200,000 $1,300,000 $1,400,000 $1,500,000

PART 1 Forecasting Cash Flows
99.44% (B16), still indicating an excellent relationship. We do see a dif-
ference in the t-statistics for the two regressions.
The t-statistic for the intercept is now 1.89 (D27), indicating it is no
longer signi¬cant at the 95% level, whereas it was 3.82 in Table 2-1B.
Another effect of fewer data is that the 95% con¬dence interval for the
intercept value is $48,485 (F27) to $190,989 (G27), a range of $239,475.
In addition, the t-statistic for the slope coef¬cient, while still signi¬cant,
has fallen from 56.94 (Table 2-1B, D34) to 26.75 (D28). The 95% con¬dence
interval for the slope now becomes $0.70 (F28) to $0.89 (G28), a range
that is 31„2 times greater than that in Table 2-1B and indicates much more
uncertainty in the variable cost than we obtain using 10 years of data.
The standard error of the Y-estimate, however, decreases from
$16,014 (Table 2-1B, B23) to $6,840. This indicates that decreasing the
number of data points improves the Y-estimate, an opposite result from
all of the preceding. Why?
Earlier, we pointed out that using only a small range for the inde-
pendent variable leads to a small denominator in the variance of b, i.e.,

which leads to larger con¬dence intervals. However, larger data sets (us-
ing more years of data) tend to lead to a larger standard error of the y-
estimate, s. As we mentioned earlier,
1 ˆ
s (Yi
n 2 i1

where Yi are the forecast (regression ¬tted) costs, Yi are the historical
costs, and n is the number of observations.14 Thus, we often have a trade-
off in deciding how many years of data to include in the regression. More
years of data leads to better con¬dence intervals, but fewer years may
lead to smaller standard errors of the y-estimate.
Table 2-4 was constructed to demonstrate that you should evaluate
all of the regression statistics carefully to determine if the relationship is
suf¬ciently strong to merit using it and which data set is best to use.
Simply looking at the adjusted R 2 value is insuf¬cient; all the regression
statistics should be evaluated in their entirety, as an improvement in one
may be counterbalanced by a deterioration in another. Therefore, it is best
to test different data sets and compare all of the regression statistics to
select the regression equation that represents the best overall relationship
between the variables.

14. We divide by n 2 instead of n because it takes two points to determine a line. If we only had
two years of historical data, we could determine a regression line, but we would know
absolutely nothing about the variance around the line. It takes a minimum of three years of
data to be able to say anything at all about how well the regression line ¬ts the data, and
three years is usually insuf¬cient. It is much better to have at least ¬ve years of data,
though four years can often suf¬ce.

CHAPTER 2 Using Regression Analysis 41
Although regression analysis is a powerful tool, its blind application can
lead to serious errors. Various problems can be encountered, and one
should be cognizant of the limitations of this technique. Aside from the
obvious problems of poor ¬t and insuf¬cient data, structural changes in
the company can also invalidate the historical relationship of sales and

Insuf¬cient Data
Insuf¬cient data leads to increased error in the regression, which in turn
will lead to increased error in the forecast data. As mentioned previously,


. 12
( 100 .)