p123
This question should be answered using the Carseats data set.
Fit a multiple regression model to predict Sales using Price, Urban, and US
Provide an interpretation of each coefficient in the model. Be careful - some of the variables in the model are qualitative!
Write out the model in equation form, being careful to handle the qualitative variables properly
For which of the predictors can you reject the null hypothesis \(H_0 :\beta_j =0\)?
On the basis of your response to the previous question, fit a smaller model that only uses the predictors for which there is evidence of association with the outcome
How well do the models in (a) and (e) fit the data?
Using the model from (e), obtain 95% confidence intervals for the coefficient(s).
Is there evidence of outliers or high leverage observations in the model from (e)?
library(ISLR)
names(Carseats)
## [1] "Sales" "CompPrice" "Income" "Advertising" "Population"
## [6] "Price" "ShelveLoc" "Age" "Education" "Urban"
## [11] "US"
carseat.fit = lm(Sales ~ Price + Urban + US, data=Carseats)
summary(carseat.fit)
##
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9206 -1.6220 -0.0564 1.5786 7.0581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.043469 0.651012 20.036 < 2e-16 ***
## Price -0.054459 0.005242 -10.389 < 2e-16 ***
## UrbanYes -0.021916 0.271650 -0.081 0.936
## USYes 1.200573 0.259042 4.635 4.86e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2335
## F-statistic: 41.52 on 3 and 396 DF, p-value: < 2.2e-16
contrasts(Carseats$US)
## Yes
## No 0
## Yes 1
USYes coefficient: If the store is in the US (predictor=1), the sales increase at a rate ofthe coefficient is 1.2.
sales = 1.2 * USYes (1.2 from coefficient from model) (ignoring other predictors for simplicity)
Price: Price is highly significant (p-value) when it comes to sales. There is a slight negative correlation to sales. As prices goes up, sales go down.
UrbanYes: Does not have a significant p-value. This means that it does not effect the sales. Consider removing from model
Sales = 13.04 - 0.05 * Price + 1.2 * US 1 if US = Yes ; 0 if US = No
We can reject the null hypothesis for Price and USYes because its p-value is highly significant (<<.05)
carseat.fit2 = lm(Sales ~ Price + US, data=Carseats)
summary(carseat.fit2)
##
## Call:
## lm(formula = Sales ~ Price + US, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9269 -1.6286 -0.0574 1.5766 7.0515
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.03079 0.63098 20.652 < 2e-16 ***
## Price -0.05448 0.00523 -10.416 < 2e-16 ***
## USYes 1.19964 0.25846 4.641 4.71e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2354
## F-statistic: 62.43 on 2 and 397 DF, p-value: < 2.2e-16
par(mfrow = c(2,2))
plot(carseat.fit2)
The models for both (a) and (e) do NOT fit the data well. The \(R^2\) statistic for both models show that the model ONLY explains 23% of the variance.
Getting the confidence intervals for each coefficient confint(carseat.fit2)
par(mfrow = c(2,2))
plot(carseat.fit2)
Based on the Residuals vs. Leverage graph (bottom right):