17 Finding the “optimal model”

17.1 Best subsets

There are many models/combinations of predictors that we could use to predict our response variable. We want to find the best model possible, but we also don't want to overfit.

So far, we manually compared two models. In fact there is a way to compare all the combinations of predictors. This is using the ols_step_best_subset() command.

Describe what the "best subset" method is doing. Hint, we will go over this in lectures, but also https://online.stat.psu.edu/stat501/lesson/10/10.3

First, decide every predictor you think might be useful and create a model using this.

FOR YOUR PROJECT, CHOOSE 8 PREDICTORS MAX (or R takes too long)

FullModel <- lm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species ,data=iris)


BestSubsets <- ols_step_best_subset(FullModel)
BestSubsets

So now we can find a model that seems to have the lowest AIC, the highest R2 etc. BUT YOU STILL HAVE TO CHECK FOR LINE!