ISLR Home

Question

p201

We will now consider the Boston housing data set, from the MASS library.

  1. Based on this data set, provide an estimate for the population mean of medv. Call this estimate \(\hat{\mu}\).

  2. Provide an estimate of the standard error of \(\hat{\mu}\). Interpret this result.

Hint: We can compute the standard error of the sample mean by dividing the sample standard deviation by the square root of the number of observations.

  1. Now estimate the standard error of \(\hat{\mu}\) using the bootstrap. How does this compare to your answer from (b)?

  2. Based on your bootstrap estimate from (c), provide a 95% confidence interval for the mean of medv. Compare it to the results obtained using t.test(Boston$medv).

Hint: You can approximate a 95% confidence interval using the formula [\(\hat{\mu} − 2SE(\hat{\mu}), \hat{\mu} + 2SE(\hat{\mu})\)].

  1. Based on this data set, provide an estimate, μˆmed, for the median value of medv in the population.

  2. We now would like to estimate the standard error of μˆmed. Unfortunately, there is no simple formula for computing the standard error of the median. Instead, estimate the standard error of the median using the bootstrap. Comment on your findings.

  3. Based on this data set, provide an estimate for the tenth percentile of medv in Boston suburbs. Call this quantity \(\hat{\mu}_{0.1}\). (You can use the quantile() function.)

  4. Use the bootstrap to estimate the standard error of \(\hat{\mu}_{0.1}\). Comment on your findings.


library(MASS)
library(boot)
names(Boston)
##  [1] "crim"    "zn"      "indus"   "chas"    "nox"     "rm"      "age"    
##  [8] "dis"     "rad"     "tax"     "ptratio" "black"   "lstat"   "medv"
summary(Boston)
##       crim                zn             indus            chas        
##  Min.   : 0.00632   Min.   :  0.00   Min.   : 0.46   Min.   :0.00000  
##  1st Qu.: 0.08205   1st Qu.:  0.00   1st Qu.: 5.19   1st Qu.:0.00000  
##  Median : 0.25651   Median :  0.00   Median : 9.69   Median :0.00000  
##  Mean   : 3.61352   Mean   : 11.36   Mean   :11.14   Mean   :0.06917  
##  3rd Qu.: 3.67708   3rd Qu.: 12.50   3rd Qu.:18.10   3rd Qu.:0.00000  
##  Max.   :88.97620   Max.   :100.00   Max.   :27.74   Max.   :1.00000  
##       nox               rm             age              dis        
##  Min.   :0.3850   Min.   :3.561   Min.   :  2.90   Min.   : 1.130  
##  1st Qu.:0.4490   1st Qu.:5.886   1st Qu.: 45.02   1st Qu.: 2.100  
##  Median :0.5380   Median :6.208   Median : 77.50   Median : 3.207  
##  Mean   :0.5547   Mean   :6.285   Mean   : 68.57   Mean   : 3.795  
##  3rd Qu.:0.6240   3rd Qu.:6.623   3rd Qu.: 94.08   3rd Qu.: 5.188  
##  Max.   :0.8710   Max.   :8.780   Max.   :100.00   Max.   :12.127  
##       rad              tax           ptratio          black       
##  Min.   : 1.000   Min.   :187.0   Min.   :12.60   Min.   :  0.32  
##  1st Qu.: 4.000   1st Qu.:279.0   1st Qu.:17.40   1st Qu.:375.38  
##  Median : 5.000   Median :330.0   Median :19.05   Median :391.44  
##  Mean   : 9.549   Mean   :408.2   Mean   :18.46   Mean   :356.67  
##  3rd Qu.:24.000   3rd Qu.:666.0   3rd Qu.:20.20   3rd Qu.:396.23  
##  Max.   :24.000   Max.   :711.0   Max.   :22.00   Max.   :396.90  
##      lstat            medv      
##  Min.   : 1.73   Min.   : 5.00  
##  1st Qu.: 6.95   1st Qu.:17.02  
##  Median :11.36   Median :21.20  
##  Mean   :12.65   Mean   :22.53  
##  3rd Qu.:16.95   3rd Qu.:25.00  
##  Max.   :37.97   Max.   :50.00
dim(Boston)
## [1] 506  14

9a

Calculating the mean medv for sample

mean.hat = mean(Boston$medv)
print(mean.hat) # 22.5328
## [1] 22.53281

9b

Calculating the standard error of medv

sd(Boston$medv)/sqrt(nrow(Boston)) # 0.4089
## [1] 0.4088611

9c Bootstrap

boot.fn = function(data, index) {
  return(mean(data[index]))
}

# Using bootstrap to figure out an estimate for population mean
bstrap = boot(Boston$medv, boot.fn, 1000)
print(bstrap) # 0.3994119
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = Boston$medv, statistic = boot.fn, R = 1000)
## 
## 
## Bootstrap Statistics :
##     original      bias    std. error
## t1* 22.53281 0.007684387   0.4000303

Comments: The SE of bootstrap is different from (b) by about 0.001.

9d Creating a 95% confidence interval

conf.int = c(bstrap$t0 - (2 * 0.3994119), bstrap$t0 + (2  * 0.3994119))

t.test(Boston$medv)
## 
##  One Sample t-test
## 
## data:  Boston$medv
## t = 55.111, df = 505, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  21.72953 23.33608
## sample estimates:
## mean of x 
##  22.53281

9e

median_hat = median(Boston$medv)
print(median_hat) # 21.2
## [1] 21.2

9f

Calculating median of data

boot.median.fn = function(data, index) {
  return(median(data[index]))
}

# Running bootstrap for median
bstrap.median = boot(Boston$medv, boot.median.fn, 10000)
print(bstrap.median) # 0.3754
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = Boston$medv, statistic = boot.median.fn, R = 10000)
## 
## 
## Bootstrap Statistics :
##     original    bias    std. error
## t1*     21.2 -0.008145   0.3815992

Comments: Median is the same, whether applying boostrap or just median(). The SE, 0.38, is small compared to the median value of 21.2.

9g

quant.ten = quantile(Boston$medv, probs=c(.1)) # 12.75 

9h

# Calculating 10%  quantile  of data
boot.quant.ten.fn = function(data, index) {
  return(quantile(data[index], probs=c(.1)))
}

# Running bootstrap for median
boot.quant.ten = boot(Boston$medv, boot.quant.ten.fn, 10000)
print(boot.quant.ten) # SE: 0.5027
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = Boston$medv, statistic = boot.quant.ten.fn, R = 10000)
## 
## 
## Bootstrap Statistics :
##     original  bias    std. error
## t1*    12.75 0.01486   0.4987355

Comments: There was no difference with the 10% quartile, There was a SE of .50.