p201
We will now consider the Boston housing data set, from the MASS library.
Based on this data set, provide an estimate for the population mean of medv. Call this estimate \(\hat{\mu}\).
Provide an estimate of the standard error of \(\hat{\mu}\). Interpret this result.
Hint: We can compute the standard error of the sample mean by dividing the sample standard deviation by the square root of the number of observations.
Now estimate the standard error of \(\hat{\mu}\) using the bootstrap. How does this compare to your answer from (b)?
Based on your bootstrap estimate from (c), provide a 95% confidence interval for the mean of medv. Compare it to the results obtained using t.test(Boston$medv).
Hint: You can approximate a 95% confidence interval using the formula [\(\hat{\mu} − 2SE(\hat{\mu}), \hat{\mu} + 2SE(\hat{\mu})\)].
Based on this data set, provide an estimate, μˆmed, for the median value of medv in the population.
We now would like to estimate the standard error of μˆmed. Unfortunately, there is no simple formula for computing the standard error of the median. Instead, estimate the standard error of the median using the bootstrap. Comment on your findings.
Based on this data set, provide an estimate for the tenth percentile of medv in Boston suburbs. Call this quantity \(\hat{\mu}_{0.1}\). (You can use the quantile() function.)
Use the bootstrap to estimate the standard error of \(\hat{\mu}_{0.1}\). Comment on your findings.
library(MASS)
library(boot)
names(Boston)
## [1] "crim" "zn" "indus" "chas" "nox" "rm" "age"
## [8] "dis" "rad" "tax" "ptratio" "black" "lstat" "medv"
summary(Boston)
## crim zn indus chas
## Min. : 0.00632 Min. : 0.00 Min. : 0.46 Min. :0.00000
## 1st Qu.: 0.08205 1st Qu.: 0.00 1st Qu.: 5.19 1st Qu.:0.00000
## Median : 0.25651 Median : 0.00 Median : 9.69 Median :0.00000
## Mean : 3.61352 Mean : 11.36 Mean :11.14 Mean :0.06917
## 3rd Qu.: 3.67708 3rd Qu.: 12.50 3rd Qu.:18.10 3rd Qu.:0.00000
## Max. :88.97620 Max. :100.00 Max. :27.74 Max. :1.00000
## nox rm age dis
## Min. :0.3850 Min. :3.561 Min. : 2.90 Min. : 1.130
## 1st Qu.:0.4490 1st Qu.:5.886 1st Qu.: 45.02 1st Qu.: 2.100
## Median :0.5380 Median :6.208 Median : 77.50 Median : 3.207
## Mean :0.5547 Mean :6.285 Mean : 68.57 Mean : 3.795
## 3rd Qu.:0.6240 3rd Qu.:6.623 3rd Qu.: 94.08 3rd Qu.: 5.188
## Max. :0.8710 Max. :8.780 Max. :100.00 Max. :12.127
## rad tax ptratio black
## Min. : 1.000 Min. :187.0 Min. :12.60 Min. : 0.32
## 1st Qu.: 4.000 1st Qu.:279.0 1st Qu.:17.40 1st Qu.:375.38
## Median : 5.000 Median :330.0 Median :19.05 Median :391.44
## Mean : 9.549 Mean :408.2 Mean :18.46 Mean :356.67
## 3rd Qu.:24.000 3rd Qu.:666.0 3rd Qu.:20.20 3rd Qu.:396.23
## Max. :24.000 Max. :711.0 Max. :22.00 Max. :396.90
## lstat medv
## Min. : 1.73 Min. : 5.00
## 1st Qu.: 6.95 1st Qu.:17.02
## Median :11.36 Median :21.20
## Mean :12.65 Mean :22.53
## 3rd Qu.:16.95 3rd Qu.:25.00
## Max. :37.97 Max. :50.00
dim(Boston)
## [1] 506 14
Calculating the mean medv for sample
mean.hat = mean(Boston$medv)
print(mean.hat) # 22.5328
## [1] 22.53281
Calculating the standard error of medv
sd(Boston$medv)/sqrt(nrow(Boston)) # 0.4089
## [1] 0.4088611
boot.fn = function(data, index) {
return(mean(data[index]))
}
# Using bootstrap to figure out an estimate for population mean
bstrap = boot(Boston$medv, boot.fn, 1000)
print(bstrap) # 0.3994119
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = Boston$medv, statistic = boot.fn, R = 1000)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 22.53281 0.007684387 0.4000303
Comments: The SE of bootstrap is different from (b) by about 0.001.
conf.int = c(bstrap$t0 - (2 * 0.3994119), bstrap$t0 + (2 * 0.3994119))
t.test(Boston$medv)
##
## One Sample t-test
##
## data: Boston$medv
## t = 55.111, df = 505, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 21.72953 23.33608
## sample estimates:
## mean of x
## 22.53281
median_hat = median(Boston$medv)
print(median_hat) # 21.2
## [1] 21.2
boot.median.fn = function(data, index) {
return(median(data[index]))
}
# Running bootstrap for median
bstrap.median = boot(Boston$medv, boot.median.fn, 10000)
print(bstrap.median) # 0.3754
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = Boston$medv, statistic = boot.median.fn, R = 10000)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 21.2 -0.008145 0.3815992
Comments: Median is the same, whether applying boostrap or just median(). The SE, 0.38, is small compared to the median value of 21.2.
quant.ten = quantile(Boston$medv, probs=c(.1)) # 12.75
# Calculating 10% quantile of data
boot.quant.ten.fn = function(data, index) {
return(quantile(data[index], probs=c(.1)))
}
# Running bootstrap for median
boot.quant.ten = boot(Boston$medv, boot.quant.ten.fn, 10000)
print(boot.quant.ten) # SE: 0.5027
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = Boston$medv, statistic = boot.quant.ten.fn, R = 10000)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 12.75 0.01486 0.4987355
Comments: There was no difference with the 10% quartile, There was a SE of .50.