The due dates are 10 days after it was first assigned (announced in class). ================================================= First assignment, Due Jan 22. 1. Use some software to plot hazard functions for normal distributions [several mean and sd choices]; Gamma distributions; log-normal distributions; extreme value distributions. 2. Refer to the exponential notes/handout: Work out the points 6a [second part]; 6b; and 8. ====================================================== A good review of likelihood based statistical Inference Methods can be found in Chapter 10 of the following notes, that you may want to read in the next few weeks. http://www.stat.umn.edu/geyer/old/5102/n2.pdf =============================================================== Second assignment, Due Feb. 4 1. Verify the cumulative hazard formulas for discrete CDF, using a five point discrete distribution. X with values x1 < x2 < x3 < x4 < x5 with prob p1 p2 p3 p4 p5 [these pi sum to 1] (a) write down the CDF F(t). (b) write down the H(t) according to one of the formula [the integration formula from F(t) to H(t) ] (c) with H(t) obtained in (b), using the product formula to get 1-F(t) or F(t). Verify it is the same as in (a) 2. We derived an estimate of H(t) in class via the MLE of piecewise exponential distribution. Let us call the estimator \hat H(t). Using the same setup, compute the [asymptotic] covariance between \hat H(t1) and \hat H(t2); for t1 < t2, two given values. [if needed, you may assume t1, t2 are equal to some interval boundary points T_m t1 = T_m and t2 = T_r ] =============================================================== Third assignment due Feb. 11 1. Derive the MLE for lambda_j for the k piece piece-wise exponential distribution based on n right censored data, i.e. (T_i, \delta_i ) for i=1,2,...n 2. Use the MYEL data set from our textbook. Use both SAS and R to fit a Weibull/extreme value regression model dur by treat and renal [you also need to take censoring status into account] Based on the output [and the assumed model], what is the survival distribution of a patient with treat = 2, renal = 0 ? ================================================================== Fourth assignment due Feb. 20 1. In the same context of problem #1, HW 3 above, Derive the Fisher information matrix for the MLE of lambda_j, j=1,...,k for the k piece piece-wise exponential distribution based on n right censored data, i.e. (T_i, \delta_i ) for i=1,2,...n [assume n >> k] 2. Use the R data set cancer in the package survival, but just the first 35 records. Fit a Weibull regression model with covariates age and sex. Find the 90% confidence intervals for the slopes (beta) for age and sex individually. Use Wald as well as likelihood ratio method. Optional and Bonus: In the above model, find the 90% confidence region for beta(age) and beta(sex) jointly, using the likelihood ratio method. Give a contour plot. [look at examples for R function contour( ) ] ======================================================================== Fifth assignment, Due Feb 29 1. Specialize/simplify the Kaplan-Meier estimator and the Greenwood formula when all observations are uncensored. Identify them with "Sta 291" formula for binomial proportion estimator. 2. Denote the Kaplan-Meier estimator by \hat F(). Find the (approx.) covariance between 1- \hat F(s) and 1- \hat F(t); for given times s < t is \hat F(t) a process with independent increments? People ask for hints re Q2. Here it is Using Taylor expansion, first show [1- \hat F(t)] - [1-F(t)] approx. = [ H(t) - \hat H(t)] exp( -H(t) ) same approx for the time at s. Finally use our knowledge for \hat H(t) to compute the covariance. ================================================================== Sixth assignment, Due March 17, 12noon 1. Using the data set cancer inside R package survival, nonparametrically estimate (based on Kaplan-Meier)the mean residual life at 365.25(days); and find a 90% confidence interval for it. Note that you may want to take a look at my note http://www.ms.uky.edu/~mai/sta635/meanRt.pdf 2. Using the data set veteran inside R package survival, perform a two sample log-rank test to see the difference of the two treatments (trt), ignore other covariants. [i.e. use only time status and trt] Also do a Gehan-Wilcoxon test. Also, assume both treatments follow Weibull distribution with same (but unknown) shape, and possibly different scale, use SAS proc lifereg/R survreg to test if the two treatments are different. Compare. ============================================================== Midterm exam will be held in April 9 in class! The Exam may include but not necessarily limited to: Exponential distribution, hazard functions. Weibull and extreme value distribution. Piece-wise exponential distribution. Likelihood function for Censored data, MLE. Weibull Regression model. Nelson-Aalen estimator, its variance and co-variance. Kaplan-Meier estimator, Greenwood formula, and co-variance. log-rank test. Wilcoxon test, weighted log-rank test. ============================================================== Homework assigned on the 4/9/2008 Suppose we are given a random sample X_1, X_2, ..., X_n iid from exp( \lambda=0.5). Independently we have another random sample delta_1, delta_2, ..., delta_n iid from Bernoulli (p=0.5) We compute the Kaplan-Meier estimator based on n pairs of (X_i, delta_i), denoted by \hat F_n (t). What is \hat F_n(t) suppose to estimate? (i.e. what is the supposed limit of \hat F_n(t) as n increase?) What is we change Bernoulli(p=0.5) to Bernoulli(p= 2/3) ? =================================================================== take home Final has two problems: 1) get the data from the web ftp://ftp.wiley.com/public/sci_tech_med/survival/ download the second edition Data.zip. Save the zip file 2) unzip and get to the data set uis. There are 628 records. 3) Run the following Cox model, and interprete/explain your outcome proc phreg data=uis; model time*censor(0) = age beck ndrugtx race treat ; strata site; run; 4) plot the predicted survival curve (based on the above model) for a person with covariate of your choice [but specify]. Two curves for two sites. 5) Run and interprete the model proc phreg data=uis; model time*censor(0) = age beck ndrugtx race treat site off_trt; if (time > los) then off_trt = 1; else off_trt = 0; run; 6) try some transformation of the covariate ndrugtx, and see if transformation is needed. Some suggestion: the first few tx may be very significant, but once above certain number the additional effect gets smaller. So may be sqrt(ndrugtx+1), log(ndruftx + 3) or 1/(ndrugtx + 2) as a covariate? Second problem of the take home has to do with generating random variables that follow Cox models and getting the residuals and plot. 1). generate a data mimic the Stanford heart transplant data [i.e. stanford2 in the package survival]. In particular, use the same age variable from the data set. Generate the 184 survival times data using a Cox model with a linear and a square term on age. [cut the t5 mismatch score]. What beta to use when generating the r.v.s? [we may fit the Cox model to stanford2 to get some hints on what beta to use for age and age square, i.e. beta1= -0.12109, beta2=0.00197]. You still have some freedom on what distribution to use for the lifetimes, what distribution to use as censoring variables etc. The idea is to generate data which are similar to those real ones. [in particular similar censoring percentage]. My notes Rsurv.pdf may be helpful. Read the examples there. This paper may also help http://www.ms.uky.edu/~mai/research/amst.pdf 2). Once the data are generated, we can fit the Cox model. But fit the model on the generated data only using a linear term on age. plot martingale residuals, deviance residuals. Comments. 3). Now fit the correct Cox model [notice the difference/similarity of the fitted beta and the beta used to generate data] and plot two types of residuals. Comment on the plots. Below are some old homework problems, many will be recycled later as this years assignments. 3. verify the MLE formula for the piecewise exp() dist. 4. find the Fisher info matrix for the piecewise exp() dist. 5. find the approx. var-cor matrix of the MLE for piecewise exp() dist. 6. re-work 3,4,5 for right censored data. 7. (do not hand-in but think about it) How to generate random variables that follow piecewise exponential distribution? (Will ask in class) 8. Specialize/simplify the Kaplan-Meier estimator and Greenwood formula when all observations are uncensored. Identify them with "Sta 291" formula for binomial proportion. 9. Refer to the "myel" data of our text book (you can download it). Use only the "dur" and "status" as observed time and censoring status, and SAS proc lifetest to find the Kaplan-Meier estimator and the "plain" confidence interval. Confirm that SAS confidence interval for median is read off the plain confidence interval. Get confidence interval for the median, using the confidence interval via "loglog" transform. Get the median confidence interval using the R package emplik. Confirm that SAS estimator of the mean is missing the contribution from the last few observations. And your fix, with the improved mean estimator. 10. Same data set as above, use the SAS proc lifereg to fit the Weibull regression model to the data, using all 4 variables. "treat" and "renal" as covariates (independent variable, predictors). What is the predicted distribution of survival time for a subject with "treat"=2, "renal"=0? 10.5 Since the parameterization of Weibull is pretty confusing (different book has different definition), may be it is better to ask "how you can generate random variables that have the predicted distribution of a subject with "treat"=2, "renal"=0? ======================================================================= 635 Oct. 26, 2006 1. Suppose I have X1, X2, ... Xn observations with censoring status d1, d2, ... dn (if di = 1 means Xi = actual death time, di =0 means actual death time is larger than Xi ) Assume no tie in the Xi's values. We can form a Kaplan-Meier estimator of survival function: 1 - \hat F(t). Suppose we replace di by ui = 1- di and form another Kaplan-Meier estimator of the survival function, based on Xi and ui ; call it 1- \hat G (t). Proof that [1- \hat F (t)]*[ 1 - \hat G(t)] = (number of Xi >t)/n 2. Suppose I use the following R code to generate k pairs of censored observations (X,d): (here k=100) k <- 100 Xvec <- rexp(n=k, rate=1) dvec <- rbinom(n=k, size=1, prob=0.5) One can compute a Nelson-Aalen estimator based on (Xvec, dvec) above. When k -> infinity, what will be the limit of the Nelson-Aalen estimator, \hat H(t) ? Justify your answer. 3. Based on the Greenwood formula (or what have you), please derive a (hopefully good, consistent at least) variance estimator for log[ - 2*log(1- \hat F(t)) ]. where \hat F(t) is the Kaplan-Meier estimator. Give some justification for your variance estimator. Due Monday, Oct. 30, 2006 ===================================================== Write a short preliminary report on the project you are going to do. (No more than one page) List some issues you are planning to investigate. and references/tools you are using. 11. Refer to the Stanford heart transplant data, 103 subjects. (Data is in SAS form in the download, or is inside R survival package as jasa) (a) fit a Cox PH model for modeling the survival time after transplant, with covariates: surg+age+age^2 Discuss implecations of your output. Based on the fitted model, Give the predicted survival curve for a hypothetical subject with surg=1, age=35. (b) time-dependent covariate: fit a Cox model for modeling survival time since acceptance with covariates: plant+surg+age+age^2 where plant is a time dependent covariate. Repeat but add one more term for the plant age interaction. but drop the age ^2. Discuss implications of your output. (c) Use SAS proc lifereg or R survreg to fit the similar model as in (a) (using weibull or other dist) compare the results. ============================================================ Take Home final: (Due Dec. 14 noon) 1) Give your project a final push: (update, revision, polishing) and send me the URL of the files or the web pages. 2) Stratified Cox model. For examples read page 158-160 of our textbook. ( If you wonder how one can generate r.v.s that follow the stratified Cox model, this is how: Separate all the yi into two groups. We use two different monotone transformations g1() and g2() for each of the group. This way we get a Cox model with two strata. ) Recall the inference in the Cox model only uses the rank of the survival times. In the stratified model, we only use rank of survival times WITHIN each strata. Analyze the data set "colon" from the R package survival. (Use only those with etype=2, so there are 929 subjects) Use the variables: rx, age, time, status, sex. Similar to the example of SAS book on p. 158-159. first model: (time, status) ~ rx+age+sex. second model: (time, status) ~ rx+age+strata(sex) third model: Use only the male data to fit model (time, status) ~ rx+age Use only the female data to fit model (time, status) ~ rx+age In each model, please plot the baseline survival curve(s). (two curves in model 2) Discuss the outcome of your analysis in different models. Note: rx should be treated as a class variable. Since SAS 9.1, proc tphreg can specify a class variable just like proc glm. (tphreg stands for testing version of phreg). Q and A: Can I use R to do it? Absolutely. Although in the industry, people mostly use SAS. What is the difference between model two and three? In model two, even though there are two different baseline hazards (one for each sex), the rx and age effect are assumed to be the same. Whereas in model three, they need not be. ============================================== Collection of project links: