We will now derive the probability that a given observation is part of a bootstrap sample. Suppose that we obtain a bootstrap sample from a set of n observations.
What is the probability that the first bootstrap observation is not the jth observation from the original sample? Justify your answer.
What is the probability that the second bootstrap observation is not the jth observation from the original sample?
Argue that the probability that the jth observation is not in the bootstrap sample is (1 − 1/n)n.
When n = 5, what is the probability that the jth observation is in the bootstrap sample?
When n = 100, what is the probability that the jth observation is in the bootstrap sample?
198 5. Resampling Methods
When n = 10, 000, what is the probability that the jth observa- tion is in the bootstrap sample?
Create a plot that displays, for each integer value of n from 1 to 100,000, the probability that the jth observation is in the bootstrap sample. Comment on what you observe.
We will now investigate numerically the probability that a boot- strap sample of size n = 100 contains the jth observation. Here j = 4. We repeatedly create bootstrap samples, and each time we record whether or not the fourth observation is contained in the bootstrap sample.
> store=rep(NA, 10000) > for(i in 1:10000){
store[i]=sum(sample(1:100, rep=TRUE)==4)>0 }
> mean(store)
Comment on the results obtained.
The validation set approach?
LOOCV?
Probably 1 - 1/n