1. Suppose we have the following iid sample from a 
population with "truncated Weibull" distribution:

We would like to estimate the parameter \theta, the truncation
value, using the statistic \hat \theta = \max X_i

Use (parametric) bootstrap method to find the (approx.) bias 
of the estimator \hat \theta.


Given the sample of 25 observations:

1.316776443 0.166121052 0.869421797 0.073887457 0.030550590 0.570711285 
0.066110543 0.522826008 0.737626325 0.020224079 0.746116968 0.307449566 
0.481742978 0.060165645 1.145738499 0.021630207 0.068465466 0.285657677 
0.363960435 0.001582244 0.340169964 0.263988231 0.537638617 0.442250852
1.309533880 


These observations are generated as follows:

1. generate a random variable by R code  rweibull(1, shape=0.8)
2. if the random variable is less than theta, take it.
   if the random variable is larger than theta, discard it and go to 1.


Please note I know the value theta (I used it to generate the sample 
of 25 obs), but you do not know. It is a parameter to be estimated.

===============================================
2. 
Refer to the mouse data, control group only. (9 observations).
You can load the data in R by
> library(bootstrap)
> data(mouse.c)
#### you can now see the data by type 
> mouse.c

[1]   52 104 146  10  50  31  40  27  46


Suppose we are interested in the statistic of sample mean, estimating the 
population mean.

(a) find a (usual) 90% t-confidence interval.
(a*) use log transformation on the data, then find a 90% t-confidence
     interval for log mu. Finally transform the confidence interval
     back to the original scale. compare with (a).

(b) find a 90% bootstrap t-confidence interval. Use bootstrap iteration
    N at least 10000.
(b*) try log transform as in (a) and compare.

(c) find a 90% bootstrap percentile confidence interval
(c*) try the log transform.

==================================================================
3.
Using the data set diabetes inside the bootstrap library.
We will only use the variable logCpeptide.

Suppose we use the statistic 1/x-bar. 

Access the bias and standard deviation of this estimator by 

(a) using nonparametric bootstrap.
(b) using Jackknife.

(c) Assume the distribution of logCpeptide is Normal with
unknown mean and variance. Use parametric bootstrap
to find the bias of 1/x-bar as an estimator of 1/mu. 

=======================================================
The above homeworks is due before midterm 
=======================================================
4.
This is problem 20.1 in our book. I am repeating it here.

For a three observation sample, there are 10  (that is  2n-1 chose n)
possible different nonparametric bootstrap samples of size 3,
namely (by using the number an observation got sampled)
(1,1,1), (0,1,2), (0,2,1)... (2,1,0).
Find the probability of those bootstrap samples.
(they follow a tri-nomial distribution).

=======================================================
5.
This is problem 20.2 in our book.

Suppose the original sample are (1, 5, 4), and the estimator
of interest is sample mean. 
List the value of this statistic on the 10 bootstrap samples above.
(For example for the bootstrap sample (0, 1, 2), the mean is (5+4+4)/3.) 

========================================================
6.
Now you have a list of 10 values (bootstrap means)
and 10 probabilities. Find the variance of this distribution.

===========================
Link of ppt file of a talk by LJ Wei. Example of using Cross Validation

http://statgen.ncsu.edu/icsa2007/talks/LJ%20Wei%20Keynote.ppt

===================================
7. In a binomial model the success probability $p$ is usually
estimated by the observed frequency of success in $n$ trials,
$\hat p $ .

Suppose we had a sample of size 88 and there were 12 successes. 
Use bootstrap to estimate the bias and variance of 
$\log \hat p $ as an estimator for $\log p $. What is the delta method say?


===============================================================

Suppose we have n pairs of observations: $(y_i, x_i)$ $i = 1, ... n$
where $y_i = $ either 1 or 0 (binary responses) and
$x_i $ are observable (predictors, or covariates).

We shall fit the model for $ p_i = Prob( y_i=1)$ as

$$
\log ( p_i/(1-p_i) ) = \alpha + \beta x_ i ; ~~~~ for ~~ i=1, ... n
$$

by MLE method. [i.e. in R, we use glm().]

Use a bootstrap sampling plan to

(a) find the sd of $\hat \beta $. Plot the histogram, is it normal?

(b) compute the confidence interval
for $\beta$.  (2 different procedures: percentile and bootstrap-t) \\

Dataset: remission, where LI is the "x", and r is the "y".

library(boot)
data(remission) 
remission

====================================================================

Define in R a function $ \rho (x, epsilon)$: 

\begin{verbatim}
> rho <- function(x, epsilon=0.2) {
y <- x^2/( epsilon + abs(x) )
return( y )
}
\end{verbatim}

This function is similar to the absolute value of x, 
when epsilon is small.
But for x near zero, it is more like x square,
when epsilon is not small.

In other words, the size of epsilon control how similar this
rho is to the function of abs( ).

We define a new estimator $\hat \theta$ based on a 
sample $x1, x2, ... xn$
to be the one that has minimum value
of $\sum_i ( \rho (x_i - \theta) ) $.

Instead of finding the minimum to compute the $\hat \theta$,
we shall use the function rlm() in the package MASS.

For that purpose we need to define the psi function.

mypsi <- function(x, epsilon = 0.2, deriv = 0){
# for the rho function:  x^2/(e+|x|).
# the 1st derivative is  x(2e+|x|)/(e+|x|)^2 for  x>0
# the 2nd derivative is  2 e^2/(e + |x|)^3
#
# however, the psi need to return psi(x)/x and psi'(x).
x <- abs(x)
 if(!deriv)
       return( (2*epsilon +x)/( epsilon +x)^2 )
2*(epsilon)^2/(epsilon + x)^3
}

After that we can find the \hat \theta by 

rlm( mydata~1, psi=mypsi )

Now, we shall take a data set generated by 

mydata <- rnorm(35, mean=1, sd=2)

Use simulation to find the variance of \hat \theta
by repeatedly generate such data and compute \hat \theta
1000 times. (then use sample sd of \hat \theta) 

Generate 25 samples (each of size 35)
Bootstrap each sample generated, to get 25
bootstrap estimate of sd. 
(use bootstrap run of 3000 for each sample)

================================================
In the above setup. use Jackknife to estimate
the bias of the \hat \theta, also for the 25
samples of size 35 each. (get 25 Jackknife bias estimate)

====================================================
In the above, replace the rnorm( ) by the following rdexp( ).

rdexp <- function(n, mean=0, scale=1){
        mean + ifelse(runif(n) > 0.5, 1, -1) * rexp(n, rate=1/scale)
}