p371
In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.
Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.
Fit a support vector classifier to the data with various values of cost, in order to predict whether a car gets high or low gas mileage. Report the cross-validation errors associated with dif- ferent values of this parameter. Comment on your results.
Now repeat (b), this time using SVMs with radial and polyno- mial basis kernels, with different values of gamma and degree and cost. Comment on your results.
Make some plots to back up your assertions in (b) and (c).
Hint: In the lab, we used the plot() function for svm objects only in cases with p = 2. When p > 2, you can use the plot() function to create plots displaying pairs of variables at a time. Essentially, instead of typing
> plot(svmfit , dat)
where svmfit contains your fitted model and dat is a data frame containing your data, you can type
> plot(svmfit , dat , x1∼x4)
in order to plot just the first and fourth variables. However, you must replace x1 and x4 with the correct variable names. To find out more, type ?plot.svm.
mpg.median = Auto$mpg %>% median()
mpg.median
## [1] 22.75
indicator = ifelse(Auto$mpg > mpg.median, 1, 0)
#indicator
Auto$mpg_above_median = indicator
Hyper … Parameters: Cost
tune.out = tune(svm, as.factor(mpg_above_median) ~ weight + displacement,
data = Auto,
kernel = "linear",
ranges = list(cost = c(0.01,0.1, 1, 5, 10, 100)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 0.1
##
## - best performance: 0.09442308
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-02 0.10455128 0.04580077
## 2 1e-01 0.09442308 0.04995254
## 3 1e+00 0.09955128 0.04275859
## 4 5e+00 0.09698718 0.04657340
## 5 1e+01 0.09698718 0.04657340
## 6 1e+02 0.09698718 0.04657340
Warning, if you forget as.factor() the tune() function will hang forever.
Parameters: Cost, Gamma
kernel <- "radial"
tune.out.radial = tune(svm, as.factor(mpg_above_median) ~ weight + displacement,
data = Auto,
kernel = kernel,
ranges = list(
cost = c(0.01,0.1, 1, 5, 10, 100),
gamma = c(0.1, 5, 10)
))
summary(tune.out.radial)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost gamma
## 100 10
##
## - best performance: 0.09185897
##
## - Detailed performance results:
## cost gamma error dispersion
## 1 1e-02 0.1 0.14012821 0.07105027
## 2 1e-01 0.1 0.10217949 0.06062294
## 3 1e+00 0.1 0.09698718 0.05649603
## 4 5e+00 0.1 0.09698718 0.05649603
## 5 1e+01 0.1 0.09442308 0.05679586
## 6 1e+02 0.1 0.09442308 0.05679586
## 7 1e-02 5.0 0.26301282 0.05764009
## 8 1e-01 5.0 0.09955128 0.05474595
## 9 1e+00 5.0 0.09942308 0.06200297
## 10 5e+00 5.0 0.09679487 0.05861092
## 11 1e+01 5.0 0.09935897 0.06066269
## 12 1e+02 5.0 0.10961538 0.05246238
## 13 1e-02 10.0 0.54846154 0.02934158
## 14 1e-01 10.0 0.09692308 0.06477553
## 15 1e+00 10.0 0.09429487 0.05780177
## 16 5e+00 10.0 0.09942308 0.06081337
## 17 1e+01 10.0 0.09935897 0.05814244
## 18 1e+02 10.0 0.09185897 0.07039231
svm.fit.radial = svm(as.factor(mpg_above_median) ~ weight + displacement, data = Auto, kernel = "radial", cost = 5, gamma = 0.1)
#summary(svm.fit.radial)
svm.fit = svm(as.factor(mpg_above_median) ~ weight + displacement, data = Auto, kernel = "linear", cost = c(1))
#summary(svm.fit)
Parameters: Cost, Degree
kernel <- "polynomial"
tune.out.linear = tune(svm, as.factor(mpg_above_median) ~ weight + displacement,
data = Auto,
kernel = kernel,
ranges = list(
cost = c(0.01,0.1, 1, 5, 10, 100),
degree = c(2,3,4,5)
))
summary(tune.out.linear)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost degree
## 0.1 3
##
## - best performance: 0.1735897
##
## - Detailed performance results:
## cost degree error dispersion
## 1 1e-02 2 0.4158333 0.09968083
## 2 1e-01 2 0.3700000 0.09335323
## 3 1e+00 2 0.3597436 0.08763800
## 4 5e+00 2 0.3367949 0.08001114
## 5 1e+01 2 0.3264103 0.09014045
## 6 1e+02 2 0.3214103 0.08359219
## 7 1e-02 3 0.2628846 0.10218075
## 8 1e-01 3 0.1735897 0.09046000
## 9 1e+00 3 0.1810256 0.07419436
## 10 5e+00 3 0.1810256 0.07419436
## 11 1e+01 3 0.1810256 0.07014551
## 12 1e+02 3 0.1962821 0.06654383
## 13 1e-02 4 0.3572436 0.10764992
## 14 1e-01 4 0.3623718 0.09225428
## 15 1e+00 4 0.3546154 0.09702163
## 16 5e+00 4 0.3164103 0.10363543
## 17 1e+01 4 0.3087821 0.10526971
## 18 1e+02 4 0.3112179 0.10714795
## 19 1e-02 5 0.2935256 0.11385118
## 20 1e-01 5 0.2118590 0.11908871
## 21 1e+00 5 0.2398718 0.10396207
## 22 5e+00 5 0.2346795 0.09157196
## 23 1e+01 5 0.2397436 0.09185748
## 24 1e+02 5 0.2474359 0.09586424
svm.fit.poly = svm(as.factor(mpg_above_median) ~ weight + displacement,
data = Auto,
kernel = "polynomial",
cost = .1, degree = 3)
#summary(svm.fit.poly)
Radial looks the best with the lowest error. #?plot.svm
plot(svm.fit, Auto, weight~displacement)
plot(svm.fit.radial, Auto, weight~displacement)
plot(svm.fit.poly, Auto, weight~displacement)