In Sections 5.3.2 and 5.3.3, we saw that the cv.glm() function can be used in order to compute the LOOCV test error estimate. Alternatively, one could compute those quantities using just the glm() and predict.glm() functions, and a for loop. You will now take this approach in order to compute the LOOCV error for a simple logistic regression model on the Weekly data set. Recall that in the context of classification problems, the LOOCV error is given in (5.4).
Fit a logistic regression model that predicts Direction using Lag1 and Lag2.
Fit a logistic regression model that predicts Direction using Lag1 and Lag2 using all but the first observation.
Use the model from (b) to predict the direction of the first observation. You can do this by predicting that the first observation will go up if P(Direction=“Up”|Lag1, Lag2) > 0.5. Was this observation correctly classified?
Write a forloop fromi=1 t oi=n, where n is the number of observations in the data set, that performs each of the following steps:
library(ISLR)
set.seed(1)
lr.fit = glm(Direction ~ Lag1 + Lag2, data=Weekly, family=binomial)
summary(lr.fit)
##
## Call:
## glm(formula = Direction ~ Lag1 + Lag2, family = binomial, data = Weekly)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.623 -1.261 1.001 1.083 1.506
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.22122 0.06147 3.599 0.000319 ***
## Lag1 -0.03872 0.02622 -1.477 0.139672
## Lag2 0.06025 0.02655 2.270 0.023232 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1496.2 on 1088 degrees of freedom
## Residual deviance: 1488.2 on 1086 degrees of freedom
## AIC: 1494.2
##
## Number of Fisher Scoring iterations: 4
train = Weekly[-1,]
test = Weekly[1,]
lr.fit.b = glm(Direction ~ Lag1 + Lag2, data=train, family=binomial)
summary(lr.fit.b)
##
## Call:
## glm(formula = Direction ~ Lag1 + Lag2, family = binomial, data = train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6258 -1.2617 0.9999 1.0819 1.5071
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.22324 0.06150 3.630 0.000283 ***
## Lag1 -0.03843 0.02622 -1.466 0.142683
## Lag2 0.06085 0.02656 2.291 0.021971 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1494.6 on 1087 degrees of freedom
## Residual deviance: 1486.5 on 1085 degrees of freedom
## AIC: 1492.5
##
## Number of Fisher Scoring iterations: 4
lr.prob = predict(lr.fit.b, test, type = "response") # 0.287534
lr.prob
## 1
## 0.5713923
lr.prob > 0.5 # >0.05 means LR predicts up
## 1
## TRUE
Weekly[1,]$Direction # Actual direction was down Down
## [1] Down
## Levels: Down Up
Answer: Predicted incorrectly that it went Up
num_incorrect = 0
for (i in 1:nrow(Weekly)) {
train = Weekly[-i,]
test = Weekly[i,]
lr.fit.b = glm(Direction ~ Lag1 + Lag2, data=train, family=binomial)
lr.prob = predict(lr.fit.b, test, type = "response")
if (lr.prob > 0.5) {
predicted_direction = "Up"
} else {
predicted_direction = "Down"
}
#is_not_correct = Weekly[i,"Direction"] != predicted_direction
if (Weekly[i,"Direction"] != predicted_direction) {
num_incorrect = num_incorrect + 1
}
}
print(num_incorrect)
## [1] 490
num_incorrect
## [1] 490
num_incorrect/nrow(Weekly) # 45% wrong
## [1] 0.4499541