( $489,032) equals B7 ($500,000) minus B16 ($989,032). The total of all

deviations from the mean must always equal zero, which it does (D16).

Column E is the squared deviations, i.e., the square of Column D. In

statistics, the independent variable(s) is known as X, while the deviations

from the mean are known as x, which explains the column labels in B5

and D5. The sum of squared deviations,

1997

x i2

i 1988

1012 (E16).

equals 1.28

The next step is to compute the squared deviations for our sample

forecast year. We assume that forecast sales for 1997 is $1.6 million (B17).

We repeat the coef¬cients from the regression formula from the ¬rst page

of the table in B36 and B37. Applying the regression equation, we would

then forecast expenses at $1,343,928 (C17).

In order to compute a 95% con¬dence interval around the expense

forecast of $1,343,928, we apply equations (2-8) and (2-9). 1998 forecast

sales are $610,968 (D17 B17 B16) above the mean of the historical

period. That is the x0 in (3-8) and (3-9). We square the term to get 3.73

1011 (E17). Then we divide that by the sum of the squared deviations in

1012 (E16) to get 0.2905650 (F17), which we

the historical period 1.28

repeat below in Row 28.

In Row 25, we insert the t-statistic of 2.306, which one can ¬nd in a

table for a 95% con¬dence level (the 0.025 column in a two-tailed distri-

bution) and eight degrees of freedom (n 10 observations 1 indepen-

dent variable 1). In Row 26 we show the standard error of the y-

estimate of $16,014, which came from Table 2-1B, B23. Row 27 is 1/n

1/10 0.1, where n is the number of observations.

Row 28 is a repetition of F17, the ratio of the squared deviation of

the forecast to the sum of the squared deviations of the independent

variables from their mean.

In B29 we add zero, and in C29 we add 1, according to equations

(2-8) and (2-9), respectively. We will explain the difference in the two

formulas shortly.

In Row 30 we add Rows 27 to 29, which are the terms in the square

root sign in the equations. Obviously, C30 B30 1. In Row 31 we take

the square root of Row 30.

Finally, we are able to calculate our 95% con¬dence intervals as Row

25 Row 26 Row 31. The 95% con¬dence interval for the mean is

2.306 $16,014 0.6249520 $23,078 (B32), approximately 1.44 times

the size of the standard error of the y-estimate. The 95% con¬dence in-

PART 1 Forecasting Cash Flows

38

terval for the speci¬c year™s cost forecast is $43,547 (C32), approximately

2.72 times the size of the standard error of the y-estimate. The 95% con-

¬dence intervals are 1.7% (B33) and 3.2% (C33) of the forecast costs for

the mean and the speci¬c year™s forecast, respectively.

You can see that both the calculation of 95% con¬dence interval for

the mean and the speci¬c year™s forecast cost is roughly two times the

standard error of the y-estimate. Statisticians often loosely approximate

the 95% con¬dence intervals as two standard errors below and above the

regression estimate. Equations (2-8) and (2-9) are more precise.

Now we will discuss the difference between equations (2-8) and

(2-9). We forecast sales to be $1.6 million in 1998, which means that our

forecast of adjusted costs for that year according to the regression equa-

tion is $1,343,928. Of course, the actual expenses will not equal that num-

ber, even if actual sales by some miracle will equal forecast sales. The

95% con¬dence interval for the mean tells us that if we add and subtract

$23,078 to our forecast of $1,343,928, then we are 95% sure that the true

regression line at sales of $1.6 million should have been between

$1,320,850 and $1,367,006. If we would experience sales of $1.6 million

many times”say 1,000 times”we would be 95% sure that the average

cost would fall in our con¬dence interval.13 Equation (2-8) is the equation

describing this con¬dence interval.

That does not mean that we are 95% sure that costs would be be-

tween $1,320,850 and $1,367,006 in any particular year when sales is $1.6

million. We need a wider con¬dence interval to be 95% sure of costs in

a particular year, given a particular level of sales. Equation (2-9) describes

the con¬dence interval for a particular year.

Thus, the $23,078 con¬dence interval”meaning that we add and

subtract that number from forecast costs”appropriately quanti¬es our

long-run expectation of the con¬dence interval around forecast costs,

given the level of sales. In business valuation we are not very concerned

that every individual year conform to our forecasts. Rather, we are con-

cerned with the long-run accuracy of the regression equation. Thus, equa-

tion (2-8) is the relevant equation for 95% con¬dence intervals for valu-

ation analysts. Remember that the con¬dence interval expands the further

we move away from the mean of the historical period. Therefore, if we

forecast the costs to go with a forecast sales of, say, $5 million in the year

2005, the con¬dence interval around the cost estimate is wider than the

1.7% (B33) around 1998 forecast.

Selecting the Data Set and Regression Equation

Table 2-4 is otherwise identical to Table 2-1B, except that instead of all 10

years of data, it only contains the last 5 years. The regression equation

for the 5 years of data is (Table 2-4, B27 and B28)

Adjusted costs $71,252 ($0.79 Sales)

Examining the regression statistics, we ¬nd that the adjusted R 2 is

13. This ignores the need to recompute the regression equation with new data.

CHAPTER 2 Using Regression Analysis 39

T A B L E 2-4

Regression Analysis 1993“1997

A B C D E F G

4 Year Sales Adjusted Costs

5 1993 $1,123,600 $965,043

6 1994 $1,191,016 $1,012,745

7 1995 $1,262,477 $1,072,633

8 1996 $1,338,226 $1,122,714

9 1997 $1,415,000 $1,199,000

11 SUMMARY OUTPUT

13 Regression Statistics

14 Multiple R 99.79%

15 R square 99.58%

16 Adjusted R square 99.44%

17 Standard error 6,840

18 Observations 5

20 ANOVA

21 df SS MS F Signi¬cance F

22 Regression 1 3.35E 10 3.35E 10 716 1.15E 04

23 Residual 3 1.40E 08 4.68E 07

24 Total 4 3.36E 10

26 Coef¬cients Standard Error t Stat P-value Lower 95% Upper 95%

27 Intercept [1] 71,252 37,624 1.89 0.15 (48,485) 190,989

28 Sales [2] 0.79 0.03 26.75 0.00 0.70 0.89

Regression Plot

[1] This is the regression estimate of ¬xed costs $1,250,000

[2] This is the regression estimate of variable costs

$1,200,000 y = 0.7924x + 71252

R2 = 0.9958

$1,150,000

$1,100,000

$1,050,000

Adj. Costs

$1,000,000

$950,000

$900,000

$850,000

$800,000

$1,100,000 $1,200,000 $1,300,000 $1,400,000 $1,500,000

Sales

PART 1 Forecasting Cash Flows

40

99.44% (B16), still indicating an excellent relationship. We do see a dif-

ference in the t-statistics for the two regressions.

The t-statistic for the intercept is now 1.89 (D27), indicating it is no

longer signi¬cant at the 95% level, whereas it was 3.82 in Table 2-1B.

Another effect of fewer data is that the 95% con¬dence interval for the

intercept value is $48,485 (F27) to $190,989 (G27), a range of $239,475.

In addition, the t-statistic for the slope coef¬cient, while still signi¬cant,

has fallen from 56.94 (Table 2-1B, D34) to 26.75 (D28). The 95% con¬dence

interval for the slope now becomes $0.70 (F28) to $0.89 (G28), a range

that is 31„2 times greater than that in Table 2-1B and indicates much more

uncertainty in the variable cost than we obtain using 10 years of data.

The standard error of the Y-estimate, however, decreases from

$16,014 (Table 2-1B, B23) to $6,840. This indicates that decreasing the

number of data points improves the Y-estimate, an opposite result from

all of the preceding. Why?

Earlier, we pointed out that using only a small range for the inde-

pendent variable leads to a small denominator in the variance of b, i.e.,

2

n

x2

i

i1

which leads to larger con¬dence intervals. However, larger data sets (us-

ing more years of data) tend to lead to a larger standard error of the y-

estimate, s. As we mentioned earlier,

n

1 ˆ

Yi)2

s (Yi

n 2 i1

ˆ

where Yi are the forecast (regression ¬tted) costs, Yi are the historical

costs, and n is the number of observations.14 Thus, we often have a trade-

off in deciding how many years of data to include in the regression. More

years of data leads to better con¬dence intervals, but fewer years may

lead to smaller standard errors of the y-estimate.

Table 2-4 was constructed to demonstrate that you should evaluate

all of the regression statistics carefully to determine if the relationship is

suf¬ciently strong to merit using it and which data set is best to use.

Simply looking at the adjusted R 2 value is insuf¬cient; all the regression

statistics should be evaluated in their entirety, as an improvement in one

may be counterbalanced by a deterioration in another. Therefore, it is best

to test different data sets and compare all of the regression statistics to

select the regression equation that represents the best overall relationship

between the variables.

14. We divide by n 2 instead of n because it takes two points to determine a line. If we only had

two years of historical data, we could determine a regression line, but we would know

absolutely nothing about the variance around the line. It takes a minimum of three years of

data to be able to say anything at all about how well the regression line ¬ts the data, and

three years is usually insuf¬cient. It is much better to have at least ¬ve years of data,

though four years can often suf¬ce.

CHAPTER 2 Using Regression Analysis 41

PROBLEMS WITH USING REGRESSION ANALYSIS FOR

FORECASTING COSTS

Although regression analysis is a powerful tool, its blind application can

lead to serious errors. Various problems can be encountered, and one

should be cognizant of the limitations of this technique. Aside from the

obvious problems of poor ¬t and insuf¬cient data, structural changes in

the company can also invalidate the historical relationship of sales and

costs.

Insuf¬cient Data

Insuf¬cient data leads to increased error in the regression, which in turn

will lead to increased error in the forecast data. As mentioned previously,