p123
In this problem we will investigate the t-statistic for the null hypothesis \(H_0 : \beta = 0\) in simple linear regression without an intercept. To begin, we generate a predictor x and a response y as follows
set.seed(0)
x = rnorm(100)
y = 2 * x + rnorm(100)
Perform a simple linear regression of y onto x, without an intercept. Report the coefficient estimate βˆ, the standard error of this coefficient estimate, and the t-statistic and p-value associated with the null hypothesis \(H_0 : \beta = 0\). Comment on these results. (You can perform regression without an intercept using the command lm(y∼x+0).)
Now perform a simple linear regression of x onto y without an intercept, and report the coefficient estimate, its standard error, and the corresponding t-statistic and p-values associated with the null hypothesis \(H_0 :\beta =0\). Comment on these results.
What is the relationship between the results obtained in (a) and (b)?
For the regression of Y onto X without an intercept, the t- statistic for \(H_0 :\beta_j =0\) takes the form βˆ/SE(βˆ), where βˆ is given by (3.38), and where
(These formulas are slightly different from those given in Sections 3.1.1 and 3.1.2, since here we are performing regression without an intercept.) Show algebraically, and confirm numerically in R, that the t-statistic can be written as
Using the results from (d), argue that the t-statistic for the regression of y onto x is the same as the t-statistic for the regression of x onto y.
In R, show that when regression is performed with an intercept, the t-statistic for \(H_0 :\beta_1 = 0\) is the same for the regression of y onto x as it is for the regression of x onto y.
library(ISLR)
set.seed(0)
x = rnorm(100)
y = 2 * x + rnorm(100)
lm.fit.no.intercept = lm(y ~ 0 + x)
summary(lm.fit.no.intercept)
##
## Call:
## lm(formula = y ~ 0 + x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.6391 -0.8650 -0.2032 0.5898 2.7879
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## x 2.1374 0.1092 19.58 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9589 on 99 degrees of freedom
## Multiple R-squared: 0.7948, Adjusted R-squared: 0.7927
## F-statistic: 383.4 on 1 and 99 DF, p-value: < 2.2e-16
Since the x is the only predictor, it is highly significant. When x goes up one unit, y goes up two units.
lm.fit.x.y = lm(x ~ 0 + y)
summary(lm.fit.x.y)
##
## Call:
## lm(formula = x ~ 0 + y)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.22971 -0.24830 0.04216 0.34170 0.71230
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## y 0.37185 0.01899 19.58 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4 on 99 degrees of freedom
## Multiple R-squared: 0.7948, Adjusted R-squared: 0.7927
## F-statistic: 383.4 on 1 and 99 DF, p-value: < 2.2e-16
It seems that everytime y goes up by about 100 units, x goes up by about 37 units
TODO
The derived equation for t-statistic shows that the x values and y values are only being mulitplied. Since multiplication has a cumulative property, switching the values for x and y will not effect the result.
Checking the t-statistic for both models and seeing that it is same for both
summary(lm.fit.no.intercept)
##
## Call:
## lm(formula = y ~ 0 + x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.6391 -0.8650 -0.2032 0.5898 2.7879
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## x 2.1374 0.1092 19.58 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9589 on 99 degrees of freedom
## Multiple R-squared: 0.7948, Adjusted R-squared: 0.7927
## F-statistic: 383.4 on 1 and 99 DF, p-value: < 2.2e-16
summary(lm.fit.x.y)
##
## Call:
## lm(formula = x ~ 0 + y)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.22971 -0.24830 0.04216 0.34170 0.71230
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## y 0.37185 0.01899 19.58 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4 on 99 degrees of freedom
## Multiple R-squared: 0.7948, Adjusted R-squared: 0.7927
## F-statistic: 383.4 on 1 and 99 DF, p-value: < 2.2e-16