library(mlr)
library(tidyverse)
Linear regression can handle both continuous and categorical predictor variables
The formula: \[y = intercept + slope × x\]
If all the variables are centered to have a mean of zero, then the intercept can be interpreted as the value of y at the mean of x (which is often more useful information).
Centering your variables like this doesn’t affect the slopes because the relationships between variables remain the same. Therefore, predictions made by linear regression models are unaffected by centering and scaling your data.
\[y = \beta_0 + \beta_1 x_1 + \epsilon\]
We use the squared residuals so that we disproportionately penalize cases that are far away from their predicted value.
The slopes in linear regression tell us how the outcome variable changes for a one-unit increase in each predictor while holding all other predictors constant. In other words, the slopes tell us how the outcome changes when we change the predictor variables, one at a time.
\[y = \beta_0 + \beta_1 X_1 + \cdots + \beta_k X_k + \epsilon\]
Recall that in logistic regression, we predict the log odds of a case belonging to a particular class. In linear regression, we simply predict the case’s value of the outcome variable.
There are other algorithms that may learn models that perform better on a particular task but aren’t as interpretable. e.g. Random forest, XGBoost, and SVMs are examples of black-box models.
Homoscedastic simply means the variance of the outcome variable doesn’t increase as the predicted value of the outcome increases.
The opposite of homoscedastic is heteroscedastic
If the residuals are heteroscedastic, it sometimes helps to build a model that predicts some transformation of the outcome variable instead. For example, predicting the log10 of the response variable is a common choice
Generalized Linear Models With Examples in R by Peter K. Dunn and Gordon K. Smyth (Springer, 2018).
Dummy variables are new representations of categorical variables that map the categories to 0 and 1.
reference level
data(Ozone, package = "mlbench")
ozoneTib <- as_tibble(Ozone)
names(ozoneTib) <- c("Month", "Date", "Day", "Ozone", "Press_height",
"Wind", "Humid", "Temp_Sand", "Temp_Monte",
"Inv_height", "Press_grad", "Inv_temp", "Visib")
ozoneTib
## # A tibble: 366 x 13
## Month Date Day Ozone Press_height Wind Humid Temp_Sand Temp_Monte
## <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 4 3 5480 8 20 NA NA
## 2 1 2 5 3 5660 6 NA 38 NA
## 3 1 3 6 3 5710 4 28 40 NA
## 4 1 4 7 5 5700 3 37 45 NA
## 5 1 5 1 5 5760 3 51 54 45.3
## 6 1 6 2 6 5720 4 69 35 49.6
## 7 1 7 3 4 5790 6 19 45 46.4
## 8 1 8 4 4 5790 3 25 55 52.7
## 9 1 9 5 6 5700 3 73 41 48.0
## 10 1 10 6 7 5700 3 59 44 NA
## # … with 356 more rows, and 4 more variables: Inv_height <dbl>,
## # Press_grad <dbl>, Inv_temp <dbl>, Visib <dbl>
ozoneClean <- mutate_all(ozoneTib, as.numeric) %>%
filter(is.na(Ozone) == FALSE)
ozoneClean
## # A tibble: 361 x 13
## Month Date Day Ozone Press_height Wind Humid Temp_Sand Temp_Monte
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 4 3 5480 8 20 NA NA
## 2 1 2 5 3 5660 6 NA 38 NA
## 3 1 3 6 3 5710 4 28 40 NA
## 4 1 4 7 5 5700 3 37 45 NA
## 5 1 5 1 5 5760 3 51 54 45.3
## 6 1 6 2 6 5720 4 69 35 49.6
## 7 1 7 3 4 5790 6 19 45 46.4
## 8 1 8 4 4 5790 3 25 55 52.7
## 9 1 9 5 6 5700 3 73 41 48.0
## 10 1 10 6 7 5700 3 59 44 NA
## # … with 351 more rows, and 4 more variables: Inv_height <dbl>,
## # Press_grad <dbl>, Inv_temp <dbl>, Visib <dbl>
we have to use -Ozone to prevent the Ozone variable from being gathered with the others.
ozoneUntidy <- gather(ozoneClean, key = "Variable",
value = "Value", -Ozone)
ggplot(ozoneUntidy, aes(Value, Ozone)) +
facet_wrap(~ Variable, scale = "free_x") +
geom_point() +
geom_smooth(method = "loess", formula="y ~ x") +
geom_smooth(method = "lm", , formula="y ~ x", col = "red") +
theme_bw()
## Warning: Removed 196 rows containing non-finite values (stat_smooth).
## Warning: Removed 196 rows containing non-finite values (stat_smooth).
## Warning: Removed 196 rows containing missing values (geom_point).
When doing any form of missing-value imputation, it’s extremely important to ensure that the data is either missing at random (MAR) or missing completely at random (MCAR), and not missing not at random (MNAR). If data is MCAR, it means the likelihood of a missing value is not related to any variable in the dataset. If data is MAR, it means the likelihood of a missing value is related only to the value of the other variables in the dataset.
There is a more powerful imputation technique called multiple imputation. The premise of multiple imputation is that you create many new datasets, replacing missing data with sensible values in each one
imputeMethod <- imputeLearner("regr.rpart")
ozoneImp <- impute(as.data.frame(ozoneClean),
classes = list(numeric = imputeMethod))
ozoneTask <- makeRegrTask(data = ozoneImp$data, target = "Ozone")
lin <- makeLearner("regr.lm")
There are two methods for automating feature selection:
wrapper methods tend to be computationally expensive. Filter methods, on the other hand, may or may not select the best-performing set of predictors but are much less computationally expensive.
The default method used by mlr (because it doesn’t depend on whether the variables are categorical or continuous) is to build a random forest to predict the outcome
filterVals <- generateFilterValuesData(ozoneTask,
method = "linear.correlation")
filterVals$data
## name type filter value
## 1: Month numeric linear.correlation 0.05371420
## 2: Date numeric linear.correlation 0.08205094
## 3: Day numeric linear.correlation 0.04151444
## 4: Press_height numeric linear.correlation 0.58752354
## 5: Wind numeric linear.correlation 0.00468138
## 6: Humid numeric linear.correlation 0.45148094
## 7: Temp_Sand numeric linear.correlation 0.76977674
## 8: Temp_Monte numeric linear.correlation 0.74159034
## 9: Inv_height numeric linear.correlation 0.57563391
## 10: Press_grad numeric linear.correlation 0.23331823
## 11: Inv_temp numeric linear.correlation 0.72712691
## 12: Visib numeric linear.correlation 0.41471463
Generate and plot filter values for ozoneTask, but using the default method randomForestSRC_importance (don’t overwrite the filterVals object). Are the variables ranked in the same order of importance between the two methods?
#ozoneFiltTask <- filterFeatures(ozoneTask,
# fval = filterVals, abs = 6)
#ozoneFiltTask <- filterFeatures(ozoneTask,
# fval = filterVals, per = 0.25)
#ozoneFiltTask <- filterFeatures(ozoneTask,
# fval = filterVals, threshold = 0.2)
filterWrapper = makeFilterWrapper(learner = lin,
fw.method = "linear.correlation")
lmParamSpace <- makeParamSet(
makeIntegerParam("fw.abs", lower = 1, upper = 12)
)
gridSearch <- makeTuneControlGrid()
kFold <- makeResampleDesc("CV", iters = 10)
tunedFeats <- tuneParams(filterWrapper, task = ozoneTask, resampling = kFold,
par.set = lmParamSpace, control = gridSearch)
## [Tune] Started tuning learner regr.lm.filtered for parameter set:
## Type len Def Constr Req Tunable Trafo
## fw.abs integer - - 1 to 12 - TRUE -
## With control class: TuneControlGrid
## Imputation value: Inf
## [Tune-x] 1: fw.abs=1
## [Tune-y] 1: mse.test.mean=25.7766240; time: 0.0 min
## [Tune-x] 2: fw.abs=2
## [Tune-y] 2: mse.test.mean=25.2783515; time: 0.0 min
## [Tune-x] 3: fw.abs=3
## [Tune-y] 3: mse.test.mean=25.0705001; time: 0.0 min
## [Tune-x] 4: fw.abs=5
## [Tune-y] 4: mse.test.mean=22.6975378; time: 0.0 min
## [Tune-x] 5: fw.abs=6
## [Tune-y] 5: mse.test.mean=21.0344477; time: 0.0 min
## [Tune-x] 6: fw.abs=7
## [Tune-y] 6: mse.test.mean=21.1072068; time: 0.0 min
## [Tune-x] 7: fw.abs=8
## [Tune-y] 7: mse.test.mean=21.3092236; time: 0.0 min
## [Tune-x] 8: fw.abs=10
## [Tune-y] 8: mse.test.mean=20.7856666; time: 0.0 min
## [Tune-x] 9: fw.abs=11
## [Tune-y] 9: mse.test.mean=20.9164209; time: 0.0 min
## [Tune-x] 10: fw.abs=12
## [Tune-y] 10: mse.test.mean=21.0425258; time: 0.0 min
## [Tune] Result: fw.abs=10 : mse.test.mean=20.7856666
tunedFeats
## Tune result:
## Op. pars: fw.abs=10
## mse.test.mean=20.7856666
For regression problems, there are three commonly used performance metrics:
Excerpt From: Hefin I. Rhys. “Machine Learning with R, the tidyverse, and mlr.” Apple Books.
Repeat the feature-filtering process in listings 9.8 and 9.9, but use the default fw.method argument (randomForestSRC_importance, or just don’t supply it). Does this select the same number of predictors as when we used linear correlation? Which method was faster?
filteredTask <- filterFeatures(ozoneTask, fval = filterVals,
abs = unlist(tunedFeats$x))
filteredModel <- train(lin, filteredTask)
featSelControl <- makeFeatSelControlSequential(method = "sfbs")
selFeats <- selectFeatures(learner = lin, task = ozoneTask,
resampling = kFold, control = featSelControl)
## [FeatSel] Started selecting features for learner 'regr.lm'
## With control class: FeatSelControlSequential
## Imputation value: Inf
## [FeatSel-x] 1: 111111111111 (12 bits)
## [FeatSel-y] 1: mse.test.mean=20.5296654; time: 0.0 min
## [FeatSel-x] 2: 011111111111 (11 bits)
## [FeatSel-y] 2: mse.test.mean=21.1348189; time: 0.0 min
## [FeatSel-x] 2: 101111111111 (11 bits)
## [FeatSel-y] 2: mse.test.mean=20.5070474; time: 0.0 min
## [FeatSel-x] 2: 110111111111 (11 bits)
## [FeatSel-y] 2: mse.test.mean=20.4789559; time: 0.0 min
## [FeatSel-x] 2: 111011111111 (11 bits)
## [FeatSel-y] 2: mse.test.mean=20.7191719; time: 0.0 min
## [FeatSel-x] 2: 111101111111 (11 bits)
## [FeatSel-y] 2: mse.test.mean=20.4503279; time: 0.0 min
## [FeatSel-x] 2: 111110111111 (11 bits)
## [FeatSel-y] 2: mse.test.mean=21.5244861; time: 0.0 min
## [FeatSel-x] 2: 111111011111 (11 bits)
## [FeatSel-y] 2: mse.test.mean=21.5342347; time: 0.0 min
## [FeatSel-x] 2: 111111101111 (11 bits)
## [FeatSel-y] 2: mse.test.mean=21.1358867; time: 0.0 min
## [FeatSel-x] 2: 111111110111 (11 bits)
## [FeatSel-y] 2: mse.test.mean=20.5869697; time: 0.0 min
## [FeatSel-x] 2: 111111111011 (11 bits)
## [FeatSel-y] 2: mse.test.mean=20.4238257; time: 0.0 min
## [FeatSel-x] 2: 111111111101 (11 bits)
## [FeatSel-y] 2: mse.test.mean=20.4394423; time: 0.0 min
## [FeatSel-x] 2: 111111111110 (11 bits)
## [FeatSel-y] 2: mse.test.mean=20.4874680; time: 0.0 min
## [FeatSel-x] 3: 111111111111 (12 bits)
## [FeatSel-y] 3: mse.test.mean=20.5296654; time: 0.0 min
## [FeatSel-x] 4: 011111111011 (10 bits)
## [FeatSel-y] 4: mse.test.mean=21.0873295; time: 0.0 min
## [FeatSel-x] 4: 101111111011 (10 bits)
## [FeatSel-y] 4: mse.test.mean=20.3951520; time: 0.0 min
## [FeatSel-x] 4: 110111111011 (10 bits)
## [FeatSel-y] 4: mse.test.mean=20.3784420; time: 0.0 min
## [FeatSel-x] 4: 111011111011 (10 bits)
## [FeatSel-y] 4: mse.test.mean=20.6320829; time: 0.0 min
## [FeatSel-x] 4: 111101111011 (10 bits)
## [FeatSel-y] 4: mse.test.mean=20.3517750; time: 0.0 min
## [FeatSel-x] 4: 111110111011 (10 bits)
## [FeatSel-y] 4: mse.test.mean=21.9178686; time: 0.0 min
## [FeatSel-x] 4: 111111011011 (10 bits)
## [FeatSel-y] 4: mse.test.mean=21.7552399; time: 0.0 min
## [FeatSel-x] 4: 111111101011 (10 bits)
## [FeatSel-y] 4: mse.test.mean=21.0354037; time: 0.0 min
## [FeatSel-x] 4: 111111110011 (10 bits)
## [FeatSel-y] 4: mse.test.mean=20.4669298; time: 0.0 min
## [FeatSel-x] 4: 111111111001 (10 bits)
## [FeatSel-y] 4: mse.test.mean=20.2846921; time: 0.0 min
## [FeatSel-x] 4: 111111111010 (10 bits)
## [FeatSel-y] 4: mse.test.mean=20.3889436; time: 0.0 min
## [FeatSel-x] 5: 111111111101 (11 bits)
## [FeatSel-y] 5: mse.test.mean=20.4394423; time: 0.0 min
## [FeatSel-x] 5: 111111111011 (11 bits)
## [FeatSel-y] 5: mse.test.mean=20.4238257; time: 0.0 min
## [FeatSel-x] 6: 011111111001 (9 bits)
## [FeatSel-y] 6: mse.test.mean=20.9541819; time: 0.0 min
## [FeatSel-x] 6: 101111111001 (9 bits)
## [FeatSel-y] 6: mse.test.mean=20.2605275; time: 0.0 min
## [FeatSel-x] 6: 110111111001 (9 bits)
## [FeatSel-y] 6: mse.test.mean=20.2431030; time: 0.0 min
## [FeatSel-x] 6: 111011111001 (9 bits)
## [FeatSel-y] 6: mse.test.mean=20.5056083; time: 0.0 min
## [FeatSel-x] 6: 111101111001 (9 bits)
## [FeatSel-y] 6: mse.test.mean=20.2151333; time: 0.0 min
## [FeatSel-x] 6: 111110111001 (9 bits)
## [FeatSel-y] 6: mse.test.mean=21.7520501; time: 0.0 min
## [FeatSel-x] 6: 111111011001 (9 bits)
## [FeatSel-y] 6: mse.test.mean=21.7147693; time: 0.0 min
## [FeatSel-x] 6: 111111101001 (9 bits)
## [FeatSel-y] 6: mse.test.mean=21.2631281; time: 0.0 min
## [FeatSel-x] 6: 111111110001 (9 bits)
## [FeatSel-y] 6: mse.test.mean=20.6554549; time: 0.0 min
## [FeatSel-x] 6: 111111111000 (9 bits)
## [FeatSel-y] 6: mse.test.mean=20.2569687; time: 0.0 min
## [FeatSel-x] 7: 111111111001 (10 bits)
## [FeatSel-y] 7: mse.test.mean=20.2846921; time: 0.0 min
## [FeatSel-x] 7: 111101111101 (10 bits)
## [FeatSel-y] 7: mse.test.mean=20.3638843; time: 0.0 min
## [FeatSel-x] 7: 111101111011 (10 bits)
## [FeatSel-y] 7: mse.test.mean=20.3517750; time: 0.0 min
## [FeatSel-x] 8: 011101111001 (8 bits)
## [FeatSel-y] 8: mse.test.mean=20.8467065; time: 0.0 min
## [FeatSel-x] 8: 101101111001 (8 bits)
## [FeatSel-y] 8: mse.test.mean=20.1791622; time: 0.0 min
## [FeatSel-x] 8: 110101111001 (8 bits)
## [FeatSel-y] 8: mse.test.mean=20.1744230; time: 0.0 min
## [FeatSel-x] 8: 111001111001 (8 bits)
## [FeatSel-y] 8: mse.test.mean=20.3946784; time: 0.0 min
## [FeatSel-x] 8: 111100111001 (8 bits)
## [FeatSel-y] 8: mse.test.mean=21.6446211; time: 0.0 min
## [FeatSel-x] 8: 111101011001 (8 bits)
## [FeatSel-y] 8: mse.test.mean=21.5958684; time: 0.0 min
## [FeatSel-x] 8: 111101101001 (8 bits)
## [FeatSel-y] 8: mse.test.mean=21.1622986; time: 0.0 min
## [FeatSel-x] 8: 111101110001 (8 bits)
## [FeatSel-y] 8: mse.test.mean=20.6966062; time: 0.0 min
## [FeatSel-x] 8: 111101111000 (8 bits)
## [FeatSel-y] 8: mse.test.mean=20.2188527; time: 0.0 min
## [FeatSel-x] 9: 111101111001 (9 bits)
## [FeatSel-y] 9: mse.test.mean=20.2151333; time: 0.0 min
## [FeatSel-x] 9: 110111111001 (9 bits)
## [FeatSel-y] 9: mse.test.mean=20.2431030; time: 0.0 min
## [FeatSel-x] 9: 110101111101 (9 bits)
## [FeatSel-y] 9: mse.test.mean=20.3197904; time: 0.0 min
## [FeatSel-x] 9: 110101111011 (9 bits)
## [FeatSel-y] 9: mse.test.mean=20.3080828; time: 0.0 min
## [FeatSel-x] 10: 010101111001 (7 bits)
## [FeatSel-y] 10: mse.test.mean=20.8117349; time: 0.0 min
## [FeatSel-x] 10: 100101111001 (7 bits)
## [FeatSel-y] 10: mse.test.mean=20.1386733; time: 0.0 min
## [FeatSel-x] 10: 110001111001 (7 bits)
## [FeatSel-y] 10: mse.test.mean=20.3594012; time: 0.0 min
## [FeatSel-x] 10: 110100111001 (7 bits)
## [FeatSel-y] 10: mse.test.mean=21.5603552; time: 0.0 min
## [FeatSel-x] 10: 110101011001 (7 bits)
## [FeatSel-y] 10: mse.test.mean=21.5617806; time: 0.0 min
## [FeatSel-x] 10: 110101101001 (7 bits)
## [FeatSel-y] 10: mse.test.mean=21.1291058; time: 0.0 min
## [FeatSel-x] 10: 110101110001 (7 bits)
## [FeatSel-y] 10: mse.test.mean=20.6516162; time: 0.0 min
## [FeatSel-x] 10: 110101111000 (7 bits)
## [FeatSel-y] 10: mse.test.mean=20.1776587; time: 0.0 min
## [FeatSel-x] 11: 110101111001 (8 bits)
## [FeatSel-y] 11: mse.test.mean=20.1744230; time: 0.0 min
## [FeatSel-x] 11: 101101111001 (8 bits)
## [FeatSel-y] 11: mse.test.mean=20.1791622; time: 0.0 min
## [FeatSel-x] 11: 100111111001 (8 bits)
## [FeatSel-y] 11: mse.test.mean=20.2192502; time: 0.0 min
## [FeatSel-x] 11: 100101111101 (8 bits)
## [FeatSel-y] 11: mse.test.mean=20.2902511; time: 0.0 min
## [FeatSel-x] 11: 100101111011 (8 bits)
## [FeatSel-y] 11: mse.test.mean=20.2676762; time: 0.0 min
## [FeatSel-x] 12: 000101111001 (6 bits)
## [FeatSel-y] 12: mse.test.mean=20.7756227; time: 0.0 min
## [FeatSel-x] 12: 100001111001 (6 bits)
## [FeatSel-y] 12: mse.test.mean=20.3149545; time: 0.0 min
## [FeatSel-x] 12: 100100111001 (6 bits)
## [FeatSel-y] 12: mse.test.mean=21.5122089; time: 0.0 min
## [FeatSel-x] 12: 100101011001 (6 bits)
## [FeatSel-y] 12: mse.test.mean=21.5417695; time: 0.0 min
## [FeatSel-x] 12: 100101101001 (6 bits)
## [FeatSel-y] 12: mse.test.mean=21.0892112; time: 0.0 min
## [FeatSel-x] 12: 100101110001 (6 bits)
## [FeatSel-y] 12: mse.test.mean=20.6120161; time: 0.0 min
## [FeatSel-x] 12: 100101111000 (6 bits)
## [FeatSel-y] 12: mse.test.mean=20.1376675; time: 0.0 min
## [FeatSel-x] 13: 110101111000 (7 bits)
## [FeatSel-y] 13: mse.test.mean=20.1776587; time: 0.0 min
## [FeatSel-x] 13: 101101111000 (7 bits)
## [FeatSel-y] 13: mse.test.mean=20.1779629; time: 0.0 min
## [FeatSel-x] 13: 100111111000 (7 bits)
## [FeatSel-y] 13: mse.test.mean=20.1902970; time: 0.0 min
## [FeatSel-x] 13: 100101111100 (7 bits)
## [FeatSel-y] 13: mse.test.mean=20.2790974; time: 0.0 min
## [FeatSel-x] 13: 100101111010 (7 bits)
## [FeatSel-y] 13: mse.test.mean=20.2558664; time: 0.0 min
## [FeatSel-x] 13: 100101111001 (7 bits)
## [FeatSel-y] 13: mse.test.mean=20.1386733; time: 0.0 min
## [FeatSel-x] 14: 000101111000 (5 bits)
## [FeatSel-y] 14: mse.test.mean=20.7358638; time: 0.0 min
## [FeatSel-x] 14: 100001111000 (5 bits)
## [FeatSel-y] 14: mse.test.mean=20.2874228; time: 0.0 min
## [FeatSel-x] 14: 100100111000 (5 bits)
## [FeatSel-y] 14: mse.test.mean=22.0801073; time: 0.0 min
## [FeatSel-x] 14: 100101011000 (5 bits)
## [FeatSel-y] 14: mse.test.mean=21.4436465; time: 0.0 min
## [FeatSel-x] 14: 100101101000 (5 bits)
## [FeatSel-y] 14: mse.test.mean=21.1474516; time: 0.0 min
## [FeatSel-x] 14: 100101110000 (5 bits)
## [FeatSel-y] 14: mse.test.mean=20.7367145; time: 0.0 min
## [FeatSel-x] 15: 110101111000 (7 bits)
## [FeatSel-y] 15: mse.test.mean=20.1776587; time: 0.0 min
## [FeatSel-x] 15: 101101111000 (7 bits)
## [FeatSel-y] 15: mse.test.mean=20.1779629; time: 0.0 min
## [FeatSel-x] 15: 100111111000 (7 bits)
## [FeatSel-y] 15: mse.test.mean=20.1902970; time: 0.0 min
## [FeatSel-x] 15: 100101111100 (7 bits)
## [FeatSel-y] 15: mse.test.mean=20.2790974; time: 0.0 min
## [FeatSel-x] 15: 100101111010 (7 bits)
## [FeatSel-y] 15: mse.test.mean=20.2558664; time: 0.0 min
## [FeatSel-x] 15: 100101111001 (7 bits)
## [FeatSel-y] 15: mse.test.mean=20.1386733; time: 0.0 min
## [FeatSel] Result: Month,Humid,Temp_Sand,Temp_... (5 bits)
selFeats
## FeatSel result:
## Features (5): Month, Humid, Temp_Sand, Temp_Monte, Inv_height
## mse.test.mean=20.2874228
ozoneSelFeat <- ozoneImp$data[, c("Ozone", selFeats$x)]
ozoneSelFeatTask <- makeRegrTask(data = ozoneSelFeat, target = "Ozone")
wrapperModel <- train(lin, ozoneSelFeatTask)
imputeMethod <- imputeLearner("regr.rpart")
imputeWrapper <- makeImputeWrapper(lin,
classes = list(numeric = imputeMethod))
featSelWrapper <- makeFeatSelWrapper(learner = imputeWrapper,
resampling = kFold,
control = featSelControl)
library(parallel)
library(parallelMap)
ozoneTaskWithNAs <- makeRegrTask(data = ozoneClean, target = "Ozone")
## Warning in makeTask(type = type, data = data, weights = weights, blocking =
## blocking, : Provided data is not a pure data.frame but from class tbl_df, hence
## it will be converted.
kFold3 <- makeResampleDesc("CV", iters = 3)
parallelStartSocket(cpus = detectCores())
## Starting parallelization in mode=socket with cpus=8.
lmCV <- resample(featSelWrapper, ozoneTaskWithNAs, resampling = kFold3)
## Exporting objects to slaves for mode socket: .mlr.slave.options
## Resampling: cross-validation
## Measures: mse
## Mapping in parallel: mode = socket; level = mlr.resample; cpus = 8; elements = 3.
##
## Aggregated Result: mse.test.mean=20.8957566
##
parallelStop()
## Stopped parallelization. All cleaned up.
lmCV
## Resample Result
## Task: ozoneClean
## Learner: regr.lm.imputed.featsel
## Aggr perf: mse.test.mean=20.8957566
## Runtime: 58.0066
wrapperModelData <- getLearnerModel(wrapperModel)
summary(wrapperModelData)
##
## Call:
## stats::lm(formula = f, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.7217 -2.8128 -0.2797 2.7788 13.8328
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.516e+01 1.950e+00 -7.775 8.24e-14 ***
## Month -2.895e-01 7.354e-02 -3.937 9.94e-05 ***
## Humid 8.891e-02 1.310e-02 6.787 4.83e-11 ***
## Temp_Sand 2.044e-01 4.252e-02 4.807 2.26e-06 ***
## Temp_Monte 2.140e-01 5.555e-02 3.853 0.000139 ***
## Inv_height -5.521e-04 1.762e-04 -3.134 0.001871 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.482 on 355 degrees of freedom
## Multiple R-squared: 0.6838, Adjusted R-squared: 0.6794
## F-statistic: 153.6 on 5 and 355 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(wrapperModelData)
par(mfrow = c(1, 1))
shows the predicted ozone level on the x-axis and the residual on the y-axis for each case. We hope that there are no patterns in this plot.
In other words, the amount of error shouldn’t depend on the predicted value.
In this situation, we have a curved relationship. This indicates that we have nonlinear relationships between predictors and ozone, and/or heteroscedasticity.
shows the quantiles of the model residuals plotted against their quantiles if they were drawn from a theoretical normal distribution. If the data deviates considerably from a 1:1 diagonal line, this suggests the residuals are not normally distributed. This doesn’t seem to be a problem for this model: the residuals line up nicely on the diagonal.
helps us identify heteroscedasticity of the residuals. There should be no patterns here, but it looks like the residuals are increasingly varied with larger predicted values, suggesting heteroscedasticity
helps us to identify cases that have excessive influence on the model parameters (potential outliers). Cases that fall inside a dotted region of the plot called Cook’s distance may be outliers whose inclusion or exclusion makes a large difference to the model. Because we can’t even see Cook’s distance here (it is beyond the axis limits), we have no worries about outliers.
These diagnostic plots (particularly the Residuals vs. Fitted plot) indicate the presence of nonlinear relationships between the predictor variables and the outcome variable.
There are no patterns in this plot. The amount of error shouldn’t depend on the predicted value.
the quantiles of the model residuals plotted against their quantiles if they were drawn from a theoretical normal distribution. If the data deviates considerably from a 1:1 diagonal line, this suggests the residuals are not normally distributed.
Indentifies Heteroscedasticity of the residuals. There should be no patterns
helps identify cases that have excessive influence on the model parameters (potential outliers)
Instead of using a wrapper method, cross-validate the process of building our model using a filter method. Are the estimated MSE values similar? Which method is faster? Tips:
First, create a filter wrapper using our imputeWrapper as the learner. Define a hyperparameter space to tune “fw.abs” using makeParamSet(). Define a tuning wrapper that takes the filter wrapper as a learner and performs a grid search. Use resample() to perform cross-validation, using the tuning wrapper as the learner.