15 Basic model comparisons
You are now being asked to assess two (or more) models and decide “which is best”. Here are some metrics we could use to compare models.
Is the model appropriate? LINE & influential outliers
There is no point comparing a model if one (or both) of them is not valid! So before carrying on, it’s important to assess whether each of your models is valid in its own right. See the LINE tutorial for more.
-
Linearity - COULD A CURVE FIT BETTER THAN A LINE?
-
Assessed by visual inspection of the scatterplot and residual plot.
- If broken, apply a transformation to the RESPONSE and see if your new model does better
-
Assessed by visual inspection of the scatterplot and residual plot.
-
Independence - To be best of your knowledge, Is your sample representative of the overall population? Are all the points independent of each other? Or do you have a “basketball team in your sample” situation (if you are trying to assess student height).
-
Assessed by visually looking for non-randomness (clusters/patterns) in your data and residual plot.
-
Assessed by visually looking for non-randomness (clusters/patterns) in your data and residual plot.
-
Normality of residuals - Are your residuals normally distributed around the line of best fit? Or are they skewed in some way?
- Assessed by the qq-plot and normality tests
- If broken apply a transformation to a/the PREDICTOR and see if your new model does better
-
Equal variance of residuals - no matter what your predictor/response - are your points around the same distance from the line? Or do you see “fanning out” / “bow-tie shapes”
- Assessed by visual inspection of residual plot and heteroskadisity F-test.
- If broken apply a transformation to a/the PREDICTOR and see if your new model does better
Coefficient of Determination \(R^2\)
We could look first at the coefficient of variation for each model in the model summary. e.g. which model explains more variation in your response variable.
Standard \(R^2\)
The amount of variability explained by your model. e.g. 90% of the variability in building height can be explained by the number of floors it has.
Adjusted \(R^2\)
This is more appropriate for multiple regression as it takes into account the number of predictors to prevent overfitting. Read this! https://online.stat.psu.edu/stat501/lesson/10/10.3 and see the lecture on transformations/AIC.
Where to find them
The easiest way is in the olsrr summary. Look at the statistics at the top.
data("iris")
model <- lm(Sepal.Length~Sepal.Width,data=iris)
olsrr::ols_regress(model)
## Model Summary
## ----------------------------------------------------------------
## R 0.118 RMSE 0.820
## R-Squared 0.014 MSE 0.681
## Adj. R-Squared 0.007 Coef. Var 14.120
## Pred R-Squared -0.011 AIC 371.992
## MAE 0.675 SBC 381.024
## ----------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
## AIC: Akaike Information Criteria
## SBC: Schwarz Bayesian Criteria
##
## ANOVA
## -------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -------------------------------------------------------------------
## Regression 1.412 1 1.412 2.074 0.1519
## Residual 100.756 148 0.681
## Total 102.168 149
## -------------------------------------------------------------------
##
## Parameter Estimates
## ---------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ---------------------------------------------------------------------------------------
## (Intercept) 6.526 0.479 13.628 0.000 5.580 7.473
## Sepal.Width -0.223 0.155 -0.118 -1.440 0.152 -0.530 0.083
## ---------------------------------------------------------------------------------------
MSE, RMSE, (root) mean squared error
This is the raw variability around the regression line in units of Y. You can see this as the residual mean squares in ANOVA, or at the summary at the top.
Information criteria: AIC
(read more here: https://online.stat.psu.edu/stat501/lesson/10/10.5).
To compare regression models, some statistical software may also give values of statistics referred to as information criterion statistics. For regression models, these statistics combine information about the SSE, the number of parameters in the model, and the sample size. A low value, compared to values for other possible models, is good. Some data analysts feel that these statistics give a more realistic comparison of models than the 𝐶𝑝 statistic because 𝐶𝑝tends to make models seem more different than they actually are.
This is a non parametric test that takes into account the number of predictors and the amount of data, so is often more robust to bad linear fits than \(R^2\) (which needs LINE to be true)
Three information criteria that we present are called Akaike’s Information Criterion (AIC), the Bayesian Information Criterion (BIC) (which is sometimes called Schwartz’s Bayesian Criterion (SBC)), and Amemiya’s Prediction Criterion (APC). The respective formulas are as follows:
In the formulas, n = sample size and p = number of regression coefficients in the model being evaluated (including the intercept). Notice that the only difference between AIC and BIC is the multiplier of p, the number of parameters.
Each of the information criteria is used in a similar way — in comparing two models, the model with the lower value is preferred. The ‘raw’ values have little physical meaning. For now, know that the lower the AIC, the “better” the model.
Let’s compare two models now, using our transformed data:
model <- lm(Sepal.Length~Sepal.Width,data=iris)
model.transformation <- lm(Sepal.Length~sqrt(Sepal.Width),data=iris)
model1summary <- summary(model)
model2summary <- summary(model.transformation)
# Adjusted R2
paste("Model 1:",round(model1summary$adj.r.squared,2) )
## [1] "Model 1: 0.01"
## [1] "Model 2: 0.01"
# AIC
AIC(model,model.transformation)
## df AIC
## model 3 371.9917
## model.transformation 3 372.2743