This is the most common objective method to fit a special type of curve (a straight line) to a set of data. The ideas behing this method can be readily extended to more complex cases.
This method is so popular because the computation behind it can be done by hand (for small problems) and are easily programmed into a computer.
However, just because 'you have a hammer, doesn't mean that everything is a nail'. Every problem should be investigated carefully to see if the technique is appropriate.
This method involves fitting a straight line to the data points to obtain the best fit. We assume that you know the equation of a line, but will review some important properties.
Equation for a line
In previous courses at high school or in linear algebra,
the equation of a straight line is often written
where
is the slope and
is the intercept.
Just to be difficult (just kidding) Statisticians usually write an
fitted linear relationship between
and
as
where
is the intercept and
is the slope. There
is a good reason for this notation - in more advanced classes,
you would see that our notation extends easily to more
complex cases whereas the former does not.
Use JMP here to fit a line to the cereal data and interpret the slopes and intercepts.
How is the line fit? How is the best fitting line found when the points are scattered? We typically use the principle of least squares. The least-squares line is the line that makes the sum of the squares of the deviations of the data points from the line in the vertical direction as small as possible.
Mathematically, the least squares line is the line that minimizes
. This formal
definition is not that important - the concept in the previous
paragraph is important.
It is possible to write out a formula for the estimated intercept and slope, but who cares - let the computer do the dirty work.
The equation of the fitted line is
where
is the estimated intercept, and
is the estimated slope. The symbol
indicates that we are referring to the estimated line and not to a line
in the entire population.
Show how to fit the straight line in JMP and how to extract information from the summary table shown by JMP.
Predictions
Once the best fittingline is found it can be used to make predictions for
new values of
. All that is done is to substitue the new value of
into the equation and compute the predicted value
.
Residuals
After the any curve is fit, it is important to examine
if the fitted curve is reasonable.
This is done using residuals.
The residual for a point is the difference between the observed value
and the predicted value, i.e. the residual from fitting a straight line is found as:
.
A residual plot can be constructed and will be explained later in the course. These are useful to see if the fitted line is a reasonable summary of the data.