p371
This problem involves the OJ data set which is part of the ISLR package. Question 9 was the OJ problem in Chapter 8.
Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.
Fit a support vector classifier to the training data using cost=0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.
What are the training and test error rates?
Use the tune() function to select an optimal cost. Consider val- ues in the range 0.01 to 10.
Compute the training and test error rates using this new value for cost.
Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.
Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree=2.
Overall, which approach seems to give the best results on this data?
Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations. (see Q8.9)
# From Q8.9
dim(OJ)
## [1] 1070 18
set.seed(1)
train = sample(1:nrow(OJ), 800)
# Don't actually use these ???
oj.train = OJ[train,]
oj.test = OJ[-train,]
oj.train.y = OJ[train,"Purchase"]
oj.test.y = OJ[-train,"Purchase"]
Fit a support vector classifier to the training data using cost=0.01, with Purchase as the response and the other variables as predictors.
Parameters: Cost = 0.01
svm.fit = svm(Purchase ~ ., data = oj.train, kernel = "linear", cost = c(0.01))
summary(svm.fit)
##
## Call:
## svm(formula = Purchase ~ ., data = oj.train, kernel = "linear", cost = c(0.01))
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.01
##
## Number of Support Vectors: 435
##
## ( 219 216 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
What are the training and test error rates?
svm.predict = predict(svm.fit, newdata = oj.train)
result = table(true=oj.train.y, pred=svm.predict)
result
## pred
## true CH MM
## CH 420 65
## MM 75 240
(result[1] + result[4]) / sum(result)
## [1] 0.825
svm.predict = predict(svm.fit, newdata = oj.test)
result = table(true=oj.test.y, pred=svm.predict)
result
## pred
## true CH MM
## CH 153 15
## MM 33 69
(result[1] + result[4]) / sum(result)
## [1] 0.8222222
Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10
tune.out = tune(svm, as.factor(Purchase) ~ .,
data = oj.train,
kernel = "linear",
ranges = list(cost = c(0.01, 0.1, 1, 5, 10)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 10
##
## - best performance: 0.17125
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01 0.17375 0.03884174
## 2 0.10 0.17875 0.03064696
## 3 1.00 0.17500 0.03061862
## 4 5.00 0.17250 0.03322900
## 5 10.00 0.17125 0.03488573
best.model <- tune.out$best.model
best.model
##
## Call:
## best.tune(method = svm, train.x = as.factor(Purchase) ~ ., data = oj.train,
## ranges = list(cost = c(0.01, 0.1, 1, 5, 10)), kernel = "linear")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 10
##
## Number of Support Vectors: 326
best.model$cost
## [1] 10
Compute the training and test error rates using this new value for cost
Parameters: Cost = 0.1
#svm.fit = svm(Purchase ~ ., data = oj.train, kernel = "linear", cost = best_model)
#summary(svm.fit)
best.model.predict = predict(best.model, newdata=oj.train)
result = table(true=oj.train.y, pred=best.model.predict)
result
## pred
## true CH MM
## CH 423 62
## MM 69 246
(result[1] + result[4]) / sum(result)
## [1] 0.83625
Training Data Confusion Matrix
#svm.predict = predict(svm.fit, newdata = oj.train)
#table(true=oj.train.y, pred=svm.predict)
#(422+246)/800
#best.model.predict = predict(best.model, newdata=oj.test)
#result = table(true=oj.test.y, pred=best.model.predict)
#result
best.model.predict = predict(best.model, newdata=oj.test)
result = table(true=oj.test.y, pred=best.model.predict)
result
## pred
## true CH MM
## CH 156 12
## MM 28 74
(result[1] + result[4]) / sum(result)
## [1] 0.8518519
#svm.predict = predict(svm.fit, newdata = oj.test)
#table(true=oj.test.y, pred=svm.predict)
#(155+71)/270
Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma
Parameters: Cost = 0.01
svm.fit.radial = svm(Purchase ~ ., data = oj.train, kernel = "radial", cost = c(0.01))
#summary(svm.fit.radial)
svm.predict = predict(svm.fit.radial, newdata = oj.train)
table(true=oj.train.y, pred=svm.predict)
## pred
## true CH MM
## CH 485 0
## MM 315 0
tune.out = tune(svm, as.factor(Purchase) ~ .,
data = oj.train,
kernel = "radial",
ranges = list(cost = c(0.01, 0.1, 1, 5, 10)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 1
##
## - best performance: 0.17625
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01 0.39375 0.06568284
## 2 0.10 0.18250 0.05470883
## 3 1.00 0.17625 0.03793727
## 4 5.00 0.18125 0.04299952
## 5 10.00 0.18125 0.04340139
Parameters: Cost = 0.1
svm.fit = svm(Purchase ~ ., data = oj.train, kernel = "radial", cost = c(0.1))
summary(svm.fit)
##
## Call:
## svm(formula = Purchase ~ ., data = oj.train, kernel = "radial", cost = c(0.1))
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 0.1
##
## Number of Support Vectors: 541
##
## ( 272 269 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
svm.predict = predict(svm.fit, newdata = oj.test)
table(true=oj.test.y, pred=svm.predict)
## pred
## true CH MM
## CH 150 18
## MM 37 65
Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree=2
Parameters: Cost = 0.01
degree <- 2
svm.fit.radial = svm(Purchase ~ ., data = oj.train, kernel = "polynomial", cost = c(0.01), degree=degree)
#summary(svm.fit.radial)
svm.predict = predict(svm.fit.radial, newdata = oj.train)
table(true=oj.train.y, pred=svm.predict)
## pred
## true CH MM
## CH 484 1
## MM 297 18
tune.out = tune(svm, as.factor(Purchase) ~ .,
data = oj.train,
kernel = "polynomial",
ranges = list(cost = c(0.01, 0.1, 1, 5, 10),
degree = degree))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost degree
## 5 2
##
## - best performance: 0.18375
##
## - Detailed performance results:
## cost degree error dispersion
## 1 0.01 2 0.39000 0.08287373
## 2 0.10 2 0.32375 0.06730166
## 3 1.00 2 0.20000 0.05137012
## 4 5.00 2 0.18375 0.05104804
## 5 10.00 2 0.18625 0.05185785
Parameters: Cost = 0.1
svm.fit = svm(Purchase ~ ., data = oj.train, kernel = "polynomial", cost = c(0.1), degree = degree)
summary(svm.fit)
##
## Call:
## svm(formula = Purchase ~ ., data = oj.train, kernel = "polynomial",
## cost = c(0.1), degree = degree)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 0.1
## degree: 2
## coef.0: 0
##
## Number of Support Vectors: 589
##
## ( 298 291 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
svm.predict = predict(svm.fit, newdata = oj.test)
table(true=oj.test.y, pred=svm.predict)
## pred
## true CH MM
## CH 161 7
## MM 73 29
Overall, which approach seems to give the best results on this data?