Question
10a \(Sales = \beta_0 + \beta_1 Price + \beta_2 Urban + \beta_3 US\)
10b Interpret Coefficients
- Model Summary
10c Model with Qualitative Variables
10d Reject Predictors/Features
10e Smaller Model
- Model Summary
- Diagostic Plots
10f Compare Models
10g 95% Confidence Interval
10h Outliers and High Leverage
- Diagnostic Plots

Question

p123

This question should be answered using the Carseats data set.

Fit a multiple regression model to predict Sales using Price, Urban, and US
Provide an interpretation of each coefficient in the model. Be careful - some of the variables in the model are qualitative!
Write out the model in equation form, being careful to handle the qualitative variables properly
For which of the predictors can you reject the null hypothesis \(H_0 :\beta_j =0\)?
On the basis of your response to the previous question, fit a smaller model that only uses the predictors for which there is evidence of association with the outcome
How well do the models in (a) and (e) fit the data?
Using the model from (e), obtain 95% confidence intervals for the coefficient(s).
Is there evidence of outliers or high leverage observations in the model from (e)?

library(ISLR)

10a \(Sales = \beta_0 + \beta_1 Price + \beta_2 Urban + \beta_3 US\)

names(Carseats)

##  [1] "Sales"       "CompPrice"   "Income"      "Advertising" "Population" 
##  [6] "Price"       "ShelveLoc"   "Age"         "Education"   "Urban"      
## [11] "US"

carseat.fit = lm(Sales ~ Price + Urban + US, data=Carseats)

10b Interpret Coefficients

Model Summary

summary(carseat.fit)

## 
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9206 -1.6220 -0.0564  1.5786  7.0581 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.043469   0.651012  20.036  < 2e-16 ***
## Price       -0.054459   0.005242 -10.389  < 2e-16 ***
## UrbanYes    -0.021916   0.271650  -0.081    0.936    
## USYes        1.200573   0.259042   4.635 4.86e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2335 
## F-statistic: 41.52 on 3 and 396 DF,  p-value: < 2.2e-16

contrasts(Carseats$US)

##     Yes
## No    0
## Yes   1

USYes coefficient: If the store is in the US (predictor=1), the sales increase at a rate ofthe coefficient is 1.2.

sales = 1.2 * USYes (1.2 from coefficient from model) (ignoring other predictors for simplicity)

Price: Price is highly significant (p-value) when it comes to sales. There is a slight negative correlation to sales. As prices goes up, sales go down.

UrbanYes: Does not have a significant p-value. This means that it does not effect the sales. Consider removing from model

10c Model with Qualitative Variables

Sales = 13.04 - 0.05 * Price + 1.2 * US 1 if US = Yes ; 0 if US = No

10d Reject Predictors/Features

We can reject the null hypothesis for Price and USYes because its p-value is highly significant (<<.05)

10e Smaller Model

Model Summary

carseat.fit2 = lm(Sales ~ Price + US, data=Carseats)
summary(carseat.fit2)

## 
## Call:
## lm(formula = Sales ~ Price + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9269 -1.6286 -0.0574  1.5766  7.0515 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.03079    0.63098  20.652  < 2e-16 ***
## Price       -0.05448    0.00523 -10.416  < 2e-16 ***
## USYes        1.19964    0.25846   4.641 4.71e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2354 
## F-statistic: 62.43 on 2 and 397 DF,  p-value: < 2.2e-16

Diagostic Plots

par(mfrow = c(2,2))
plot(carseat.fit2)

10f Compare Models

The models for both (a) and (e) do NOT fit the data well. The \(R^2\) statistic for both models show that the model ONLY explains 23% of the variance.

10g 95% Confidence Interval

Getting the confidence intervals for each coefficient confint(carseat.fit2)

10h Outliers and High Leverage

Diagnostic Plots

par(mfrow = c(2,2))
plot(carseat.fit2)

Based on the Residuals vs. Leverage graph (bottom right):

There is one observation that is far right of the graph. This means that its leverage is really high. Also there are few more that are few more that have high leverage.

ISLR Q3.10 - Multiple Regression/Carseats

Question

10a \(Sales = \beta_0 + \beta_1 Price + \beta_2 Urban + \beta_3 US\)

10b Interpret Coefficients

Model Summary

10c Model with Qualitative Variables

10d Reject Predictors/Features

10e Smaller Model

Model Summary

Diagostic Plots

10f Compare Models

10g 95% Confidence Interval

10h Outliers and High Leverage

Diagnostic Plots