The due dates are 10 days after it was first assigned (announced in class).

=================================================

First assignment, Due Jan 22.

1. Use some software to plot hazard functions for normal distributions
   [several mean and sd choices];
   Gamma distributions; log-normal distributions; extreme value distributions.

2. Refer to the exponential notes/handout: Work out the
   points  6a [second part]; 6b; and 8.

======================================================
A good review of likelihood based statistical Inference Methods
can be found in Chapter 10 of the following notes, that you may want
to read in the next few weeks.

http://www.stat.umn.edu/geyer/old/5102/n2.pdf
===============================================================

Second assignment, Due Feb. 4


1. Verify the cumulative hazard formulas for discrete CDF, using a
   five point discrete distribution.

    X with values   x1 < x2 < x3 < x4 < x5
      with prob     p1   p2   p3   p4   p5      [these pi sum to 1]

    (a) write down the CDF  F(t).
    (b) write down the H(t) according to one of the formula
        [the integration formula from F(t) to H(t) ]
    (c) with H(t) obtained in (b), using the product formula
        to get 1-F(t) or F(t). Verify it is the same as in (a)


2. We derived an estimate of H(t) in class via the MLE of piecewise
   exponential distribution. Let us call the estimator \hat H(t).

   Using the same setup, compute the [asymptotic] covariance
   between  \hat H(t1) and \hat H(t2);
   for t1 < t2, two given values. [if needed, you may assume
   t1, t2 are equal to some interval boundary points T_m

   t1 = T_m  and  t2 = T_r   ]


===============================================================

Third assignment  due Feb. 11

1. Derive the MLE for lambda_j for the k piece piece-wise exponential
distribution based on n right censored data, i.e. (T_i, \delta_i )
for i=1,2,...n

2. Use the MYEL data set from our textbook.
   Use both SAS and R to
   fit a Weibull/extreme value regression model

           dur  by  treat and renal
   [you also need to take censoring status into account]


   Based on the output [and the assumed model], what is the
   survival distribution of a patient
   with treat = 2, renal = 0 ?
==================================================================

Fourth assignment  due Feb. 20

1. In the same context of problem #1, HW 3 above,
   Derive the Fisher information matrix for the
   MLE of lambda_j, j=1,...,k for the k piece
   piece-wise exponential
   distribution based on n right censored data, i.e. (T_i, \delta_i )
   for i=1,2,...n  [assume n >> k]

2. Use the R data set cancer in the package survival, but just the first
   35 records. Fit a Weibull regression model with covariates age and sex.
   Find the 90% confidence intervals for the slopes (beta) for age and sex
   individually. Use Wald as well as likelihood ratio method.


Optional and Bonus:
       In the above model, find the 90% confidence region for
       beta(age) and beta(sex) jointly,
       using the likelihood ratio method. Give a contour plot.
       [look at examples for R function contour( ) ]

========================================================================

Fifth assignment, Due Feb 29


1. Specialize/simplify the Kaplan-Meier estimator and the
   Greenwood formula when all observations are uncensored. Identify
   them with "Sta 291" formula for binomial proportion estimator.


2. Denote the Kaplan-Meier estimator by \hat F().
   Find the (approx.) covariance
   between  1- \hat F(s)  and 1- \hat F(t);  for given times s < t

   is \hat F(t) a process with independent increments?


People ask for hints re Q2. Here it is

Using Taylor expansion, first show

[1- \hat F(t)] - [1-F(t)] approx. = [ H(t) - \hat H(t)] exp( -H(t) )

same approx for the time at s.

Finally use our knowledge for \hat H(t) to compute the covariance.
==================================================================

Sixth assignment, Due March 17, 12noon

1. Using the data set cancer inside R package survival, nonparametrically estimate
(based on Kaplan-Meier)the mean residual life at 365.25(days); and find a 90% confidence
interval for it.
Note that you may want to take a look at my note
http://www.ms.uky.edu/~mai/sta635/meanRt.pdf


2. Using the data set veteran inside R package survival, perform a two sample
log-rank test to see the difference of the two treatments (trt), ignore other
covariants. [i.e. use only time status and trt] Also do a Gehan-Wilcoxon test.

Also, assume both treatments follow Weibull distribution with same (but unknown) shape,
and possibly different scale,
use SAS proc lifereg/R survreg to test if the two treatments
are different. Compare.

==============================================================
Midterm exam will be held in April 9 in class!


The Exam may include but not necessarily  limited to:

Exponential distribution, hazard functions.
Weibull and extreme value distribution.
Piece-wise exponential distribution.
Likelihood function for Censored data, MLE.
Weibull Regression model.


Nelson-Aalen estimator, its variance and co-variance.
Kaplan-Meier estimator, Greenwood formula, and co-variance.
log-rank test. Wilcoxon test, weighted log-rank test.

==============================================================


Homework assigned on the 4/9/2008


Suppose we are given a random sample

X_1, X_2, ..., X_n        iid from exp( \lambda=0.5).

Independently we have another random sample

delta_1, delta_2, ...,  delta_n       iid from Bernoulli (p=0.5)

We compute the Kaplan-Meier estimator based on n pairs
of (X_i, delta_i),  denoted by \hat F_n (t).

What is \hat F_n(t) suppose to estimate?

(i.e. what is the supposed limit of \hat F_n(t) as n increase?)

What is we change Bernoulli(p=0.5) to Bernoulli(p= 2/3) ?

===================================================================



take home Final has two problems:


1) get the data from the web  

ftp://ftp.wiley.com/public/sci_tech_med/survival/

download the second edition Data.zip.  Save the zip file

2) unzip and get to the data set uis. There are 628 records.

3) Run the following Cox model, and interprete/explain your outcome 

proc phreg data=uis;
   model time*censor(0) = age beck ndrugtx race treat ; 
strata site; 
run; 

4) plot the predicted survival curve (based on the above model)
   for a person with covariate of your choice [but specify]. Two curves
   for two sites. 

5) Run and interprete the model

proc phreg data=uis; 
model time*censor(0) = age beck ndrugtx  race treat site off_trt; 
if (time > los) then off_trt = 1; 
else off_trt = 0; 
run; 


6) try some transformation of the covariate ndrugtx, and see if transformation
   is needed. Some suggestion: the first few tx may be very significant, but
   once above certain number the additional effect gets smaller. 
   So may be  sqrt(ndrugtx+1), log(ndruftx + 3) or 1/(ndrugtx + 2) as a covariate? 


Second problem of the take home has to do with generating random variables
that follow Cox models and getting the residuals and plot.


1). generate a data mimic the Stanford heart transplant data [i.e. stanford2 in
    the package survival]. 
   In particular, use the same age variable from the data set. Generate the
   184 survival times data using a Cox model with a linear and a square term on age. 
   [cut the t5 mismatch score].
   What beta to use when generating the r.v.s?
   [we may fit the Cox model to stanford2 to get some hints on what beta to use for age 
   and age square, i.e. beta1= -0.12109, beta2=0.00197]. 
   You still have some freedom on what distribution to use for the lifetimes, what 
   distribution to use as censoring variables etc. The idea is to generate data which
   are similar to those real ones. [in particular similar censoring percentage].
   My notes Rsurv.pdf may be helpful. Read the examples there.
   This paper may also help   http://www.ms.uky.edu/~mai/research/amst.pdf

2). Once the data are generated, we can fit the Cox model. 
   But fit the model on the generated data only using a linear term on age. 
   plot martingale residuals, deviance residuals. Comments.

3). Now fit the correct Cox model [notice the difference/similarity of the fitted beta
    and the beta used to generate data] and plot two types of residuals.  
    Comment on the plots.











































Below are some old homework problems, many will be recycled later
as this years assignments.











3. verify the MLE formula for the piecewise exp() dist.
4. find the Fisher info matrix for the piecewise exp() dist.
5. find the approx. var-cor matrix of the MLE for piecewise exp() dist.

6. re-work 3,4,5 for right censored data.

7. (do not hand-in but think about it) How to generate random variables
   that follow piecewise exponential distribution? (Will ask in class)

8. Specialize/simplify the Kaplan-Meier estimator and
   Greenwood formula when all observations are uncensored. Identify
   them with "Sta 291" formula for binomial proportion.

9. Refer to the "myel" data of our text book (you can download it).
   Use only the "dur" and "status" as observed time and censoring status,
   and SAS proc lifetest to find the Kaplan-Meier estimator and the
   "plain" confidence interval.

   Confirm that SAS confidence interval for median is read off the plain
   confidence interval.

   Get confidence interval for the median, using the confidence interval
   via "loglog" transform.

   Get the median confidence interval using the R package emplik.

   Confirm that SAS estimator of the mean is missing the contribution from
   the last few observations. And your fix, with the improved mean estimator.

10. Same data set as above, use the SAS proc lifereg to fit the Weibull
    regression model to the data, using all 4 variables. "treat" and "renal"
    as covariates (independent variable, predictors).
    What is the predicted distribution of survival time
    for a subject with "treat"=2, "renal"=0?

10.5 Since the parameterization of Weibull is pretty confusing (different
     book has different definition), may be it is better to ask
     "how you can generate random variables that have the predicted
     distribution of a subject with "treat"=2, "renal"=0?

=======================================================================

635 Oct. 26, 2006

1. Suppose I have X1, X2, ... Xn observations
with censoring status d1, d2, ... dn
(if di = 1 means Xi = actual death time,
di =0 means actual death time is larger than Xi )
Assume no tie in the Xi's values.

We can form a Kaplan-Meier estimator of survival function:
1 - \hat F(t).

Suppose we replace di by  ui = 1- di
and form another Kaplan-Meier estimator of the survival
function, based on Xi and ui ; call it  1- \hat G (t).

Proof that  [1- \hat F (t)]*[ 1 - \hat G(t)] = (number of Xi >t)/n


2. Suppose I use the following R code to
generate k pairs of censored observations (X,d): (here k=100)

k <- 100
Xvec <- rexp(n=k, rate=1)
dvec <- rbinom(n=k, size=1, prob=0.5)

One can compute a Nelson-Aalen estimator based on (Xvec, dvec) above.
When k -> infinity, what will be the limit of the
Nelson-Aalen estimator, \hat H(t) ?  Justify your answer.


3. Based on the Greenwood formula (or what have you), please derive a
(hopefully good, consistent at least)
variance estimator for log[ - 2*log(1- \hat F(t)) ]. where \hat F(t) is the
Kaplan-Meier estimator. Give some justification for your variance estimator.

Due Monday, Oct. 30, 2006
=====================================================

Write a short preliminary report on the project you are
going to do. (No more than one page)
List some issues you are planning to investigate.
and references/tools you are using.


11. Refer to the Stanford heart transplant data, 103 subjects.

(Data is in SAS form in the download, or is inside R survival package
as jasa)

(a) fit a Cox PH model for modeling

the survival time after transplant, with covariates: surg+age+age^2

Discuss implecations of your output.

Based on the fitted model,
Give the predicted survival curve for a hypothetical subject with
surg=1, age=35.

(b) time-dependent covariate:

fit a Cox model for modeling
survival time since acceptance
with covariates: plant+surg+age+age^2
where plant is a time dependent covariate.

Repeat but add one more term for the plant age interaction.
but drop the age ^2.

Discuss implications of your output.

(c) Use SAS proc lifereg or R survreg to fit the similar model as in (a)
(using weibull or other dist) compare the results.

============================================================
Take Home final: (Due Dec. 14 noon)

1) Give your project a final push: (update, revision, polishing) and
send me the URL of the files or the web pages.

2) Stratified Cox model. For examples read page 158-160 of our textbook.

( If you wonder how one can generate r.v.s that follow the stratified
Cox model, this is how: Separate all the yi into two groups. We use
two different monotone transformations g1() and g2() for each of
the group.  This way we get a Cox model with two strata. )

Recall the inference in the Cox model only uses the rank of the
survival times. In the stratified model, we only use rank of survival
times WITHIN each strata.

Analyze the data set "colon" from the R package survival.
(Use only those with etype=2, so there are 929 subjects)

Use the variables: rx, age, time, status, sex.

Similar to the example of SAS book on p. 158-159.

first model: (time, status) ~ rx+age+sex.

second model: (time, status) ~ rx+age+strata(sex)

third model: Use only the male data to fit model
                    (time, status) ~ rx+age
             Use only the female data to fit model
                    (time, status) ~ rx+age

In each model, please plot the baseline survival curve(s).
(two curves in model 2)

Discuss the outcome of your analysis in different models.

Note: rx should be treated as a class variable. Since SAS 9.1, proc tphreg
      can specify a class variable just like proc glm. (tphreg stands for
      testing version of phreg).

Q and A:

Can I use R to do it? Absolutely. Although in the industry, people
mostly use SAS.

What is the difference between model two and three?
In model two, even though there are two different baseline hazards
(one for each sex), the rx and age effect are assumed to be the same.
Whereas in model three, they need not be.
==============================================

Collection of project links: