Question
5a
5b Plot
5c Logistic Regression
5d Plot
5e
5f
5g
5h
5i

Question

p369

We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.

Generate a data set with n = 500 and p = 2, such that the observations belong to two classes with a quadratic decision boundary between them. For instance, you can do this as follows:
Plot the observations, colored according to their class labels. Your plot should display X1 on the x-axis, and X2 on the y- axis.
Fit a logistic regression model to the data, using X1 and X2 as predictors.
Apply this model to the training data in order to obtain a pre- dicted class label for each training observation. Plot the ob- servations, colored according to the predicted class labels. The decision boundary should be linear.
Now fit a logistic regression model to the data using non-linear functions of X1 and X2 as predictors (e.g. X12, X1 ×X2, log(X2), and so forth).
Apply this model to the training data in order to obtain a pre- dicted class label for each training observation. Plot the ob- servations, colored according to the predicted class labels. The decision boundary should be obviously non-linear. If it is not, then repeat (a)-(e) until you come up with an example in which the predicted class labels are obviously non-linear.
Fit a support vector classifier to the data with X1 and X2 as predictors. Obtain a class prediction for each training observa- tion. Plot the observations, colored according to the predicted class labels.
Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.
Comment on your results.

5a

set.seed(421)
x1=runif(500)-0.5
x2=runif(500)-0.5
y=1*(x1^2 - x2^2 > 0) # Multiply by 1 so we don't get a boolean

5b Plot

#dat = data.frame(x1=x1, x2= x2, y=as.factor(y))
dat = data.frame(x1=x1, x2= x2, y=y)
plot(dat$x1, dat$x2, col=(3-y), pch=4)

5c Logistic Regression

Fit a logistic regression model to the data, using X1 and X2 as predictors.

#set.seed(1)
train = sample(nrow(dat), nrow(dat)*.98)
lr.fit = glm(y~., data = dat, family = binomial)
#lr.fit = glm(y ~ x1 + x2, data = dat[train,], family = binomial)
summary(lr.fit)

## 
## Call:
## glm(formula = y ~ ., family = binomial, data = dat)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.278  -1.227   1.089   1.135   1.175  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  0.11999    0.08971   1.338    0.181
## x1          -0.16881    0.30854  -0.547    0.584
## x2          -0.08198    0.31476  -0.260    0.795
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 691.35  on 499  degrees of freedom
## Residual deviance: 690.99  on 497  degrees of freedom
## AIC: 696.99
## 
## Number of Fisher Scoring iterations: 3

5d Plot

Predict Training Data

lr.predict.train.probs = predict(lr.fit, newdata = dat, type = "response")

#preds = rep(0, 250)
#preds[lr.predict.train.probs >= 0.5] = 1
preds = ifelse(lr.predict.train.probs > 0.52, 1, 0)

Confusion Matrix

table(preds, dat[,"y"])

##      
## preds   0   1
##     0  39  96
##     1 196 169

(28+109)/250 # 54% accuracy

## [1] 0.548

(39+169)/500 # 54% accuracy

## [1] 0.416

Plot

Note, 3 - preds converts to 3,2 for colors

Positive is ‘+’, blue

Negative is ‘x’, red

dat.neg = dat[preds==0,]
dat.pos = dat[preds==1,]
{plot(dat.pos$x1, dat.pos$x2, col=("blue"), pch="+", xlab = "X1", ylab = "X2")
 points(dat.neg$x1, dat.neg$x2, col=("red"), pch=4) 
}

Prince Solution

Prince used all the data. Breaking out the training data gave us very different results.

lm.fit = glm(y ~ x1 + x2, family = binomial)
data = data.frame(x1 = x1, x2 = x2, y = y)
lm.prob = predict(lm.fit, data, type = "response")
lm.pred = ifelse(lm.prob > 0.52, 1, 0)
data.pos = data[lm.pred == 1, ]
data.neg = data[lm.pred == 0, ]
plot(data.pos$x1, data.pos$x2, col = "blue", xlab = "X1", ylab = "X2", pch = "+")
points(data.neg$x1, data.neg$x2, col = "red", pch = 4)

5e

Now fit a logistic regression model to the data using non-linear functions of X1 and X2 as predictors (e.g. X12, X1 ×X2, log(X2), and so forth)

#plot(lr.fit)

#library(glmnet)
#lr.fit.poly = glm(y~poly(x1,2) + poly(x2,2), data = dat[train,], family = binomial)
#lm.fit = glm(y ~ poly(x1, 2) + poly(x2, 2) + I(x1 * x2), data = data, family = binomial)
lr.fit.poly = glm(y~poly(x1,2) + poly(x2,2), data = dat, family = binomial)

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

#summary(lr.fit.poly)

5f

pred.poly = predict(lr.fit.poly, dat, family = binomial)
lm.poly.pred = ifelse(pred.poly > 0.5, 1, 0)
dat.neg = dat[lm.poly.pred==0,]
dat.pos = dat[lm.poly.pred==1,]
{plot(dat.pos$x1, dat.pos$x2, col=("blue"), pch="+", xlab = "X1", ylab = "X2")
 points(dat.neg$x1, dat.neg$x2, col=("red"), pch=4) 
}

5g

library(e1071)
dat.svm = data.frame(x1=x1, x2= x2, y=as.factor(y))

svm.fit <- svm(y ~ x1 + x2, data = dat.svm, kernel = "linear")
svm.pred = predict(svm.fit, dat.svm)

dat.neg = dat[svm.pred==0,]
dat.pos = dat[svm.pred==1,]
{plot(dat.pos$x1, dat.pos$x2, col=("blue"), pch="+", xlab = "X1", ylab = "X2")
 points(dat.neg$x1, dat.neg$x2, col=("red"), pch=4) 
}

5h

svm.fit <- svm(y ~ x1 + x2, data = dat.svm, kernel = "radial")
svm.pred = predict(svm.fit, dat.svm)

dat.neg = dat[svm.pred==0,]
dat.pos = dat[svm.pred==1,]
{plot(dat.pos$x1, dat.pos$x2, col=("blue"), pch="+", xlab = "X1", ylab = "X2")
 points(dat.neg$x1, dat.neg$x2, col=("red"), pch=4) 
}

```

ISLR Q9.5 - SVM with a non-linear kernel