p299
In this exercise, you will further analyze the Wage data set considered throughout this chapter.
Perform polynomial regression to predict wage using age. Use cross-validation to select the optimal degree d for the polyno- mial. What degree was chosen, and how does this compare to the results of hypothesis testing using ANOVA? Make a plot of the resulting polynomial fit to the data.
Fit a step function to predict wage using age, and perform cross-validation to choose the optimal number of cuts. Make a plot of the fit obtained.
library(ISLR)
library(boot)
# Creating a placeholder for the cv errors
cv.error = rep(0, 5)
# Running a for loop to iterate through each polynomial and fitting data
for (i in 1:5) {
# Fitting data the polynomial i
glm.fit = glm(wage ~ poly(age, i), data=Wage)
# Saving the CV estimate for the fit
cv.error[i] = cv.glm(Wage, glm.fit)$delta[1]
}
# Plotting the results of for loop
plot(c(1:5), cv.error)
Comments:
cv.error = rep(0, 5)
# Running a for loop to iterate through each step function and fitting data
for (i in 2:5) {
# Fitting data the polynomial i
print(i)
Wage$age.cut = cut(Wage$age, i)
glm.fit = glm(wage ~ age.cut, data=Wage)
# Saving the CV estimate for the fit
cv.error[i] = cv.glm(Wage, glm.fit)$delta[2]
}
## [1] 2
## [1] 3
## [1] 4
## [1] 5
plot(c(2:5), cv.error[2:5], pch=20, cex=0.5, lwd=2)