Importance Sampling when sampling from empirical distribution Assume we work with the statistic mean, and we are interested in upper tail prob. P(mean(x) > 0.4). The bootstrap population is the empirical distribution, i.e. equally likely, (the nonparametric bootstrap) xvec <- rnorm(30) ### the sample pvec <- exp(1.3*xvec) pvec <- pvec/sum(pvec) ### the tilted prob vec, take lambda =1.3 sample(x=xvec, size=30, replace=TRUE, prob=pvec) ### no longer equally likely, larger x is more likely mean(sample(x=xvec, size=30, replace=TRUE, prob=pvec) ) ### should be larger than 0 #### I need to keep track which x-value got resampled, according to which prob. indexvec <- sample(x=1:30, size=30, replace=TRUE, prob=pvec) xvec[indexvec] pvec[indexvec] OK. Now in a loop to compute average for(i in 1:1000) { indexvec <- sample(x=1:30, size=30, replace=TRUE, prob=pvec) bootvec <- xvec[indexvec] pvecboot <- pvec[indexvec] result[i] <- as.numeric(mean(bootvec) > 0.4) /prod( 30*pvecboot ) ### not very efficient, for those mean( ) <0.4, I do not need the prod( ) } mean(result[1:1000]) ### (1) Compare this with the plain Monte Carlo for(i in 1:1000) { bootvec <- sample(x=xvec, size=30, replace=TRUE) result[i] <- as.numeric(mean(bootvec) > 0.4) } mean(result[1:1000]) ### (2) The (2) should be more variable than (1). Verify this by repeat the above many times and plot.