Find relationships for prediction.
Regression is a generic term for all methods attempting to fit a model to observed data in order to quantify the relationship between two groups of variables. The fitted model may then be used either to merely describe the relationship between the two groups of variables, or to predict new values.
Multiple Linear Regression (MLR)
This procedure performs linear regression on the selected dataset. This fits a linear model of the form
Y= b 0 + b 1 X 1 + b 2 X 2 + …. + b k X k + e
where Y is the dependent variable (response) and X 1 , X 2 ,.. .,X k are the independent variables (predictors) and e is random error. b 0 , b 1 , b 2 , …. b k are known as the regression coefficients, which have to be estimated from the data. The multiple linear regression algorithm in XLMiner chooses regression coefficients so as to minimize the difference between predicted values and actual values.
Linear regression is performed either to predict the response variable based on the predictor variables, or to study the relationship between the response variable and predictor variables. For example, using linear regression, the crime rate of a state can be explained as a function of other demographic factors like population, education, male to female ratio etc.
Linear Regression Model
Linear regression is a statistical procedure for predicting the value of a dependent variable from an independent variable when the relationship between the variables can be described with a linear model.
A linear regression equation can be written as Yp= mX + b, where Yp is the predicted value of the dependent variable, m is the slope of the regression line, and b is the Y-intercept of the regression line.
In statistics, linear regression is a method of estimating the conditional expected value of one variable y given the values of some other variable or variables x. The variable of interest, y, is conventionally called the “dependent variable”. The terms “endogenous variable” and “output variable” are also used. The other variables x are called the “independent variables”. The terms “exogenous variables” and “input variables” are also used. The dependent and independent variables may be scalars or vectors. If the independent variable is a vector, one speaks of multiple linear regression.
Statement of the linear regression model
A linear regression model is typically stated in the form y = α + βx + ε
The right hand side may take other forms, but generally comprises a linear combination of the parameters, here denoted α and β. The term ε represents the unpredicted or unexplained variation in the dependent variable; it is conventionally called the “error” whether it is really a measurement error or not. The error term is conventionally assumed to have expected value equal to zero, as a nonzero expected value could be absorbed into α. See also errors and residuals in statistics; the difference between an error and a residual is also dealt with below. It is also assumed that is ε independent of x.
A useful alternative to linear regression is robust regression in which mean absolute error is minimized instead of mean squared error as in linear regression. Robust regression is computationally much more intensive than linear regression and is somewhat more difficult to implement as well.
Robust regression usually means linear regression with robust (Huber-White) standard errors (e.g. relaxing the assumption of homoskedasticity).
An equivalent formulation which explicitly shows the linear regression as a model of conditional expectation is with the conditional distribution of y given x essentially the same as the distribution of the error term. A linear regression model need not be affine, let alone linear, in the independent variables x.
Try multivariate analysis in action – download free trial!
Download the most easy to use all-in-one tool for multivariate analysis. It is the preferred tool for 25000 data analysts, researchers and engineers who need to analyse data quickly, easily and accurately.
Get started with multivariate analysis.
Maximize your analytical skills and accelerate your organisations success using multivariate analysis with our flexible training options to suit different learning preferences and skill levels.
Book: An introduction to Multivariate Analysis.
All updated 6th edition of the best selling book on chemometrics and multivariate techniques, covering PLS, PCA, TOS, DoE and much more.