The due dates are 10 days after it was first assigned (announced in class). 1. plot hazards for normal distributions [several mean and sd choices]; log-normal distributions; extreme value distributions. 2. verify the cumulative hazard formula for discrete CDF. 3. verify the MLE formula for the piecewise exp() dist. 4. find the Fisher info matrix for the piecewise exp() dist. 5. find the approx. var-cor matrix of the MLE for piecewise exp() dist. 6. re-work 3,4,5 for right censored data. 7. (do not hand-in but think about it) How to generate random variables that follow piecewise exponential distribution? (Will ask in class) 8. Specialize/simplify the Kaplan-Meier estimator and Greenwood formula when all observations are uncensored. Identify them with "Sta 291" formula for binomial proportion. 9. Refer to the "myel" data of our text book (you can download it). Use only the "dur" and "status" as observed time and censring status, and SAS proc lifetest to find the Kaplan-Meier estimator and the "plain" confidence interval. Confirm that SAS confidence interval for median is read off the plain confidence interval. Get confidence interval for the median, using the confidence interval via "loglog" transform. Get the median confidence interval using the R package emplik. Confirm that SAS estimator of the mean is missing the contribution from the last few observations. And your fix, with the improved mean estimator. 10. Same data set as above, use the SAS proc lifereg to fit the Weibull regression model to the data, using all 4 variables. "treat" and "renal" as covariates (independent variable, predictors). What is the predicted distribution of survival time for a subject with "treat"=2, "renal"=0? 10.5 Since the parametrization of Weibull is pretty confusing (different book has different definition), may be it is better to ask "how you can generate random variables that have the predicted distribution of a subject with "treat"=2, "renal"=0? ======================================================================= 635 Oct. 26, 2006 1. Suppose I have X1, X2, ... Xn observations with censoring status d1, d2, ... dn (if di = 1 means Xi = actual death time, di =0 means actual death time is larger than Xi ) Assume no tie in the Xi's values. We can form a Kaplan-Meier estimator of survival function: 1 - \hat F(t). Suppose we replace di by ui = 1- di and form another Kaplan-Meier estimator of the survival function, based on Xi and ui ; call it 1- \hat G (t). Proof that [1- \hat F (t)]*[ 1 - \hat G(t)] = (number of Xi >t)/n 2. Suppose I use the following R code to generate k pairs of censored observations (X,d): (here k=100) k <- 100 Xvec <- rexp(n=k, rate=1) dvec <- rbinom(n=k, size=1, prob=0.5) One can compute a Nelson-Aalen estimator based on (Xvec, dvec) above. When k -> infinity, what will be the limit of the Nelson-Aalen estimator, \hat H(t) ? Justify your answer. 3. Based on the Greenwood formula (or what have you), please derive a (hopefully good, consistant at least) variance estimator for log[ - 2*log(1- \hat F(t)) ]. where \hat F(t) is the Kaplan-Meier estimator. Give some justification for your variance estimator. Due Monday, Oct. 30, 2006 ===================================================== Write a short preliminary report on the project you are going to do. (No more than one page) List some issues you are planning to investigate. and references/tools you are using. 11. Refer to the Stanford heart transplant data, 103 subjects. (Data is in SAS form in the download, or is inside R survival package as jasa) (a) fit a Cox PH model for modeling the survival time after transplant, with covariates: surg+age+age^2 Discuss implecations of your output. Based on the fitted model, Give the predicted survival curve for a hypothetical subject with surg=1, age=35. (b) time-dependent covariate: fit a Cox model for modeling survival time since acceptance with covariates: plant+surg+age+age^2 where plant is a time dependent covariat. Repeat but add one more term for the plant age interaction. but drop the age ^2. Discuss implecations of your output. (c) Use SAS proc lifereg or R survreg to fit the similar model as in (a) (using weibull or other dist) compare the results. ============================================================ Take Home final: (Due Dec. 14 noon) 1) Give your project a final push: (update, revision, polishing) and send me the URL of the files or the web pages. 2) Stratified Cox model. For examples read page 158-160 of our textbook. ( If you wonder how one can generate r.v.s that follow the stratified Cox model, this is how: Separate all the yi into two groups. We use two different monotone transformations g1() and g2() for each of the group. This way we get a Cox model with two strata. ) Recall the inference in the Cox model only uses the rank of the survival times. In the stratified model, we only use rank of survival times WITHIN each strata. Analyze the data set "colon" from the R package survival. (Use only those with etype=2, so there are 929 subjects) Use the variables: rx, age, time, status, sex. Similar to the example of SAS book on p. 158-159. first model: (time, status) ~ rx+age+sex. second model: (time, status) ~ rx+age+strata(sex) third model: Use only the male data to fit model (time, status) ~ rx+age Use only the female data to fit model (time, status) ~ rx+age In each model, please plot the baseline survival curve(s). (two curves in model 2) Discuss the outcome of your analysis in different models. Note: rx should be treated as a class variable. Since SAS 9.1, proc tphreg can specify a class variable just like proc glm. (tphreg stands for testing version of phreg). Q and A: Can I use R to do it? Absolutely. Although in the industry, people mostly use SAS. What is the difference between model two and three? In model two, even though there are two different baseline hazards (one for each sex), the rx and age effect are assumed to be the same. Whereas in model three, they need not be. ============================================== Collection of project links: