Executive Summary
Regression is a technique that allows one to determine the value of one or more quantities based on the values of other quantities. Linear regression is a type of regression that assumes this determination can be made based upon a simple, linear relationship. In its simple form, linear regression models the relationship between a nonrandom, one-dimensional \(X\) that is known, and a random, one-dimensional \(Y\) as $$Y = \beta_1X + \beta_0 + \epsilon.$$ where \(\beta_1\) and \(\beta_0\) are unknown constants and \(\epsilon\) is a random variable which may represent measurement error or some other source of randomness. Simple linear regression is easily generalized to to allow for multiple predictors or a multi-dimensional \(Y\).
Linear regression is very significant for didactic and practical reasons. Linear regression is important from a didactic perspective because pretty much any important concept in statistics or machine learning is a facet of linear regression analysis, so it is frequently used as a simple illustration of such concepts. Linear regression is also very widely used in practice because the underlying models are very interpretable, they don't require much data to use, and many real relationships are approximately linear. Since linear regression has such foundational importance and practical utility, it is a subject worthy of its own book (or books). Despite its apparent simplicity, linear regression has so many applications and associated pitfalls that it requires careful study.
Incomplete List of Canonical Problems
This is a sample of the problems that arise and are dealt with in the subject of linear regression.Fitting Coefficients
Clearly one must have a way to determine \(\beta_0\) and \(\beta_1\) and their generalizations if one is to make use of linear regression. Finding the best parameters can be done in a variety of ways that balance computational complexity with underlying model assumptions and desired properties of the model.Inference on Coefficients
The "best" parameters for a given set of data are usually somewhat random. It is often of interest to determine how certain one is that those parameters represent some true, underlying relationship. This is (roughly speaking) known as model inference.Regression Diagnostics
There are many assumptions that are made when using linear regression and one will usually wish to verify the validity of those assumptions. Diagnostics that can be used for this purpose are a major topic of applied linear regression analysis.Coercion
Even if one's data does not meet the assumptions required by linear regression, there are many techniques for making it do so. Transformations can be applied to make relationships linear, robust models can be used for pathological randomness, and regularization techniques can be used in cases where fitting coefficients is difficult etc. There are many more ways in which data can be coerced.
The Effect of an Outlier on a Regression Fit
Recommended Books
Introduction to Linear Regression Analysis
Douglas C. Montgomery, Elizabeth A. Peck, G. Geoffrey Vining
Key Features
- In-text exercises
- Solution manual available
- R and SAS code examples
Key Topics
- Box-Cox Transformation
- Confidence Intervals
- Diagnostics
- Generalized Linear Models
- Hypothesis Testing
- Least-Squares Estimation
- Leverage and Influence
- Logistic Regression
- Maximum-Likelihood Estimation
- Model Adequacy Checking
- Model Validation
- Multicollinearity
- Multiple Linear Regression
- Nonlinear Regression
- Nonparametric Regression
- Outliers
- PRESS Statistic
- Poisson Regression
- Polynomial Regression
- Prediction
- Random Regressors
- Residual Analysis
- Robust Regression
- Simple Linear Regression
- Time Series
- Transformations
- Variable Selection
- Variance-Stabilizing Transformations
- Weighted Least-Squares
Description
This book gives a fairly standard introduction to simple and multiple linear regression, and then it devotes most of the text to dealing with their practical problems. Detecting and dealing with multicolinearity and outliers as well as many diagnostics and other practical topics occupy the majority of the book. Generalized linear models are introduced, but they really need their own treatment (we recommend some here ).