Some of the project topics are too small to be a 2-person project. If you want to work on a specific one please let me know. ================================= Possible projects for Sta 635: ================================= Added Feb. 2008. (1) Use empirical likelihood to test the hypothesis about mean residual life time. [see my notes: empirical likelihood and mean residual time] Write R function to test the mean residual time at a given age is equal to a given year, based on the code el.cen.EM2(), which will search min over a value automatically. Proof of the lemma that min over a parameter turns a chi square df=2 statistics into a chi square df=1 statistic Some examples or simulations. Added 2005. (0) more efficient algorithm (than grid search) to find confidence region (for dim >=2) from likelihood ratio value. (0.5) Sample size determination for the logrank test. The influence of censoring etc. Survey of Softwares. Demonstration of free package. See http://www.biostat.wisc.edu/~kosorok/renyi.html Ref: Sample Size Calculations in Clinical Research by Shein-Chung Chow, Jun Shao and Hansheng Wang http://www.childrens-mercy.org/stats/weblog2004/survival.asp http://www.jhsph.edu/Research/Centers/CCT/javamarc/Shih/shihsizeuserguide.htm (1) A special type of regression model: y_i = a + bx_i + U e where Ui = exp( r x_i) , e is extreme value. How to parametricly estimate the a, b, r ? (MLE?) Is there a two-step least squares procedure? What about censored data? (2) Bootstrap applications in Survival analysis. There are many.... (distribution of logrank test for small sample size, etc) (3) Piecewise exponential: all the related stuff, carried to the limit.... including some R coding... * interval censored data MLE and how do estimator change as the number of pieces grow? * How to do a proc lifereg with error distribution as piecewise exponential? How do things change as number of pieces grow? etc. with fixed cutting points (not change with subjects)? (4) Stability/robustness of the Kaplan-Meier/Nelson-Aalen estimator. under error observation/perturbation; (observe value with error) under censoring indicator error, etc. (5) Implement Lin and Wei (1992) paper (of 3 pages) on the Buckley-James estimator and compare it with existing (EL) methods. (6) Cox model with a surviving fraction: model assumptions and implementation. Compare with regular Cox model. ===================================================================== (0). Estimation (Kaplan-Meier, Nelson-Aalen) with late entry/early withdraw data. The variance estimator. Confidence intervals--comparison of several methods. (1). One sample log-rank and other rank tests. Compare one sample to the general population (census) data. (From survival package ratetables) Compare of several methods (accuracy of p-value). Also may include the covariates of race, age and sex in the test (adjust for covariate). Survival package in R (Splus) and its instructions should be helpful. (2). How to chose the weight function in the weighted log-rank type tests to maximize power? (more theoretical) Reference: Gill's book (censoring and stochastic integrals) (3). Similar to (2) but to demonstrate that the power of log-rank test can suffer for non-proportional hazards $H_A$. And the possible fix (apply the test only for certain time interval? or adhoc? or more systematically?). %How to handle late entry/early withdraw data? (3.5) (added 2006) combine two tests: a logrank and a test for cross hazards. Evaluate the (power) property. (4). Testing hypothesis for equality of and confidence interval for difference of two medians. (use R function discemlik() or emplik.Hs.test()) See also (7), should work in close tie with (7). Better or worse than log-rank test? (If you want to work on this, I have some more info) (5). Residuals in the parametric regression (lifereg) and semi-parametric regression model (phreg). Types of residuals. How do they behave under correct and wrong models, (mis-specification, ...). Simulation. Plots. (6). Frailty Model. An introduction and example. Use the exponential regression model with a random effects term to explain. Some useful references: Book by Therneau and Grambsch, Tech Report by Therneau, Grambsch and Pankratz. (7) Confidence Interval Estimation of median with censored data via empirical likelihood el.cen.EM(). Variations: Use a smoothed indicator function. Either a linear smoother or a cubic smoother. Cubic: G(t) = t - t^3/15 + 2 \sqrt 5/3 , for |t| < \sqrt 5 and zero or one otherwise. Linear: G(t) = t + 0.5 , for |t| < 0.5 and zero or one otherwise. Compare to the performance with plain indicator function. (8) Efficiency comparison between a Cox model and exponential/weibull model when all model are valid. (and small departures). Under various model specifications: beta value, censoring percentage etc. (see my Notes). (I will do this in class, so you cannot do it again :-( ). (9) Compare the performance of two versions of logrank test: one from the proc lifetest, another from proc phreg, score test. Try them on continuous and discrete distributions. (small project) Which has better power for small samples? or more accurate P values? (10) Implement a version of log-rank test for a non-parametric sample versus a Weibull sample. Use simulations to evaluate several variations of the test.