1. Suppose we have the following iid sample from a population with "truncated Weibull" distribution: We would like to estimate the parameter \theta, the truncation value, using the statistic \hat \theta = \max X_i Use (parametric) bootstrap method to find the (approx.) bias of the estimator \hat \theta. Given the sample of 25 observations: 1.316776443 0.166121052 0.869421797 0.073887457 0.030550590 0.570711285 0.066110543 0.522826008 0.737626325 0.020224079 0.746116968 0.307449566 0.481742978 0.060165645 1.145738499 0.021630207 0.068465466 0.285657677 0.363960435 0.001582244 0.340169964 0.263988231 0.537638617 0.442250852 1.309533880 These observations are generated as follows: 1. generate a random variable by R code rweibull(1, shape=0.8) 2. if the random variable is less than theta, take it. if the random variable is larger than theta, discard it and go to 1. Please note I know the value theta (I used it to generate the sample of 25 obs), but you do not know. It is a parameter to be estimated. =============================================== 2. Refer to the mouse data, control group only. (9 observations). You can load the data in R by > library(bootstrap) > data(mouse.c) #### you can now see the data by type > mouse.c [1] 52 104 146 10 50 31 40 27 46 Suppose we are interested in the statistic of sample mean, estimating the population mean. (a) find a (usual) 90% t-confidence interval. (a*) use log transformation on the data, then find a 90% t-confidence interval for log mu. Finally transform the confidence interval back to the original scale. compare with (a). (b) find a 90% bootstrap t-confidence interval. Use bootstrap iteration N at least 10000. (b*) try log transform as in (a) and compare. (c) find a 90% bootstrap percentile confidence interval (c*) try the log transform. ================================================================== 3. Using the data set diabetes inside the bootstrap library. We will only use the variable logCpeptide. Suppose we use the statistic 1/x-bar. Access the bias and standard deviation of this estimator by (a) using nonparametric bootstrap. (b) using Jackknife. (c) Assume the distribution of logCpeptide is Normal with unknown mean and variance. Use parametric bootstrap to find the bias of 1/x-bar as an estimator of 1/mu. ======================================================= The above homeworks is due before midterm ======================================================= 4. This is problem 20.1 in our book. I am repeating it here. For a three observation sample, there are 10 (that is 2n-1 chose n) possible different nonparametric bootstrap samples of size 3, namely (by using the number an observation got sampled) (1,1,1), (0,1,2), (0,2,1)... (2,1,0). Find the probability of those bootstrap samples. (they follow a tri-nomial distribution). ======================================================= 5. This is problem 20.2 in our book. Suppose the original sample are (1, 5, 4), and the estimator of interest is sample mean. List the value of this statistic on the 10 bootstrap samples above. (For example for the bootstrap sample (0, 1, 2), the mean is (5+4+4)/3.) ======================================================== 6. Now you have a list of 10 values (bootstrap means) and 10 probabilities. Find the variance of this distribution. =========================== Link of ppt file of a talk by LJ Wei. Example of using Cross Validation http://statgen.ncsu.edu/icsa2007/talks/LJ%20Wei%20Keynote.ppt =================================== 7. In a binomial model the success probability $p$ is usually estimated by the observed frequency of success in $n$ trials, $\hat p $ . Suppose we had a sample of size 88 and there were 12 successes. Use bootstrap to estimate the bias and variance of $\log \hat p $ as an estimator for $\log p $. What is the delta method say? =============================================================== Suppose we have n pairs of observations: $(y_i, x_i)$ $i = 1, ... n$ where $y_i = $ either 1 or 0 (binary responses) and $x_i $ are observable (predictors, or covariates). We shall fit the model for $ p_i = Prob( y_i=1)$ as $$ \log ( p_i/(1-p_i) ) = \alpha + \beta x_ i ; ~~~~ for ~~ i=1, ... n $$ by MLE method. [i.e. in R, we use glm().] Use a bootstrap sampling plan to (a) find the sd of $\hat \beta $. Plot the histogram, is it normal? (b) compute the confidence interval for $\beta$. (2 different procedures: percentile and bootstrap-t) \\ Dataset: remission, where LI is the "x", and r is the "y". library(boot) data(remission) remission ==================================================================== Define in R a function $ \rho (x, epsilon)$: \begin{verbatim} > rho <- function(x, epsilon=0.2) { y <- x^2/( epsilon + abs(x) ) return( y ) } \end{verbatim} This function is similar to the absolute value of x, when epsilon is small. But for x near zero, it is more like x square, when epsilon is not small. In other words, the size of epsilon control how similar this rho is to the function of abs( ). We define a new estimator $\hat \theta$ based on a sample $x1, x2, ... xn$ to be the one that has minimum value of $\sum_i ( \rho (x_i - \theta) ) $. Instead of finding the minimum to compute the $\hat \theta$, we shall use the function rlm() in the package MASS. For that purpose we need to define the psi function. mypsi <- function(x, epsilon = 0.2, deriv = 0){ # for the rho function: x^2/(e+|x|). # the 1st derivative is x(2e+|x|)/(e+|x|)^2 for x>0 # the 2nd derivative is 2 e^2/(e + |x|)^3 # # however, the psi need to return psi(x)/x and psi'(x). x <- abs(x) if(!deriv) return( (2*epsilon +x)/( epsilon +x)^2 ) 2*(epsilon)^2/(epsilon + x)^3 } After that we can find the \hat \theta by rlm( mydata~1, psi=mypsi ) Now, we shall take a data set generated by mydata <- rnorm(35, mean=1, sd=2) Use simulation to find the variance of \hat \theta by repeatedly generate such data and compute \hat \theta 1000 times. (then use sample sd of \hat \theta) Generate 25 samples (each of size 35) Bootstrap each sample generated, to get 25 bootstrap estimate of sd. (use bootstrap run of 3000 for each sample) ================================================ In the above setup. use Jackknife to estimate the bias of the \hat \theta, also for the 25 samples of size 35 each. (get 25 Jackknife bias estimate) ==================================================== In the above, replace the rnorm( ) by the following rdexp( ). rdexp <- function(n, mean=0, scale=1){ mean + ifelse(runif(n) > 0.5, 1, -1) * rexp(n, rate=1/scale) }