Friday, October 16, 2009

Section 2.1.  Linear Regression










2.1. Linear Regression


For most of this book, we'll be working on applications that do linear regression, a simple but informative statistic. Suppose that you have a series of data pairs, such as the quarterly sales figures for a particular department, shown in Table 2.1.


Table 2.1. Quarterly Sales Figures for a Hypothetical Company (millions of dollars)

Quarter

Sales

1

107.5

2

110.3

3

114.5

4

116.0

5

119.3

6

122.4



A regression line is the straight line that passes nearest all the data points (see Figure 2.1). The formula for such a line is y = mx + b, or the sales (y) for a given quarter (x) rise at a quarterly rate (m) from a base at "quarter zero" (b). We have the x and y values; we'd like to determine m and b.



Figure 2.1. The sales figures from Table 2.1, plotted in a graph. The line drawn through the data points is the closest straight-line fit for the data.







The formulas for linear regression are as follows:






The value r is the correlation coefficient, a figure showing how well the regression line models the data. A value of 0 means that the x and y values have no detectable relation to each other; ±1 indicates that the regression line fits the data perfectly.


Linear regression is used frequently in business and the physical and social sciences. When x represents time, lines derived from regressions are trends from which past and future values can be estimated. When x is volume of sales and y is costs, you can claim b as fixed cost and m as marginal cost. Correlation coefficients, good and bad, form the quantitative heart of serious arguments about marketing preferences and social injustice.


The demands on a program for turning a series of x and y values into a slope, intercept, and correlation coefficient are not great: Keep a running total of x, x2, y, y2, and xy; keep note of the count (n); and run the formulas when all the data has been seen.












No comments:

Post a Comment