Some of the project topics are too small to be a 2-person project.
If you want to work on a specific one please let me know.

=================================
Possible projects for Sta 695:
=================================

Added April 2009.

(1) Adopt the R package kmci, bring it up to the current version of R. 
This is a package for obtain various confidence intervals related to the 
Kaplan-Meier estimator. Used to be a package in R available on CRAN.
But needs to brought up to the new version. Older version still available on web.

(2) Compare the Wilcoxon test: the version from the classic nonparametric text,
and the one specialized from the Gehan-Wilcoxon test (when there is no censoring).
The difference is in the variance estimation. Topics to investigate: which variance
estimator is better (more acurate)? Under H0, or under Ha?
Which give you a better normal approximation for the distribution of test statistic?
Which one give you a better power?

Added Feb. 2008.

(1) Use empirical likelihood to test the hypothesis about mean residual life time.
[see my notes: empirical likelihood and mean residual time]

Write R function to test the mean/median residual time at a given age is equal to a given
year, based on the code el.cen.EM2(), which will search min over a value
automatically. Equality of two mean/median residual lifetime.


Added 2005.

(0) more efficient algorithm (than grid search)
    to find confidence region (for dim >=2) from likelihood ratio value.

(0.5) Sample size determination for the logrank test. The influence of
censoring etc. Survey of Softwares. Demonstration of free package.
See http://www.biostat.wisc.edu/~kosorok/renyi.html 
Ref: Sample Size Calculations in Clinical Research
by Shein-Chung Chow, Jun Shao and Hansheng Wang 

http://www.childrens-mercy.org/stats/weblog2004/survival.asp

http://www.jhsph.edu/Research/Centers/CCT/javamarc/Shih/shihsizeuserguide.htm

(1) A special type of regression model: y_i = a + bx_i + U e
where Ui = exp( r x_i) , e is extreme value.

How to parametricly estimate the a, b, r  ? (MLE?)
Is there a two-step least squares procedure?
What about censored data?

(2) Bootstrap applications in Survival analysis. There are many....
(distribution of logrank test for small sample size, etc)

(3) Piecewise exponential: all the related stuff, carried to the limit....
    including some R coding...   * interval censored data MLE and how
    do estimator change as the number of pieces grow?
     * How to do a proc lifereg with error distribution as piecewise
    exponential? How do things change as number of pieces grow? etc.
with fixed cutting points (not change with subjects)?

(4) Stability/robustness of the Kaplan-Meier/Nelson-Aalen estimator. 
under error observation/perturbation; (observe value with error)
under censoring indicator mis-classify, etc.

(5) Implement Lin and Wei (1992) paper (of 3 pages) on the Buckley-James
estimator and compare it with existing (EL) methods.

(6) Cox model with a surviving fraction (cure model): model assumptions and 
implementation. Compare with regular Cox model.

=====================================================================
(0). Estimation (Kaplan-Meier, Nelson-Aalen) with late entry/early withdraw
data. The variance estimator. Confidence intervals--comparison of 
several methods. Take a look at code for the Cox model in R/SAS.

(1). One sample log-rank and other rank tests. Compare one
sample to the general population (census) data. (From survival
package ratetables) Compare of several methods (accuracy of p-value).

Also may include the covariates of race, age and sex in the test 
(adjust for covariate).

Survival package in R (Splus) and its instructions should be helpful.


(2). How to chose the weight function in the weighted log-rank type
tests to maximize power? (more theoretical) 
Reference: Gill's book (censoring and stochastic integrals)

(3). Similar to (2) but to demonstrate that the power of log-rank test
can suffer for non-proportional hazards $H_A$. And the possible fix 
(apply the test only for certain time interval? or adhoc? 
or more systematically?).
%How to handle late entry/early withdraw data?

(3.5) (added 2006) combine two tests: a logrank and a test
for cross hazards. Evaluate the (power) property.

Update for 3 and 3.5: since I post my talk on this topic, this becomes a 
one person project. and I expect more examples.

(4). Testing hypothesis for equality of, and confidence interval
for difference of two medians.
(use R function discemlik() or emplik.Hs.test())  
See also (7), should work in 
close tie with (7).  Better or worse than log-rank test?
(If you want to work on this, I have some more info)

(5). Residuals in the parametric regression (lifereg) and semi-parametric
regression model (phreg). Types of residuals. How do they behave under
correct and wrong models, (mis-specification, omition of covariate...).  
Simulations. Plots. 


(6). Frailty Model. An introduction and example. Use the exponential
regression model with a random effects term to explain.

Some useful references: Book by Therneau and Grambsch, Tech Report by
Therneau, Grambsch and Pankratz. 

(7) Confidence Interval Estimation of median with censored data via empirical
likelihood el.cen.EM().  Variations: Use a smoothed
indicator function. Either a linear smoother or a cubic smoother.
Cubic: G(t) = t - t^3/15 + 2 \sqrt 5/3 ,   for |t| < \sqrt 5
and zero or one otherwise.
Linear: G(t) = t + 0.5 ,   for  |t| < 0.5
and zero or one otherwise.
Compare to the performance with plain indicator function.


(8) Efficiency comparison between a Cox model and exponential/weibull
model when all model are valid. (and small departures).
Under various model specifications: beta value, censoring percentage etc.
(see my Notes). (I will do this in class, so you cannot do it again :-( ).

(9) Compare the performance of two versions of logrank test: one from the
proc lifetest, another from proc phreg, score test.
Try them on continuous and discrete distributions. (small project)
Which has better power for small samples? or more accurate P values?


Data set online:

http://www.sci.usq.edu.au/staff/dunn/Datasets/tech-anova.html

http://archive.ics.uci.edu/ml/

Also a project topic:   Make the confint2( ) function for the Cox model coxph()
and survreg() function, similar to how it works for the glm() function.
Now, the only case that it returns Wilks likelihood ratio interval is for glm()
and in other cases, it returns Wald interval.


Cross validation of Buckley-James estimator:  given a data, Buckley-James
not only gives an estimator of beta, it also give an estimator of the error distribution
CDF (by Kaplan-Meier).
When we do cross validation, are we just validate the beta? or validate both?
What is the sample size of the training sample and validation sample are very
different?  (I think cross validation should validate both beta and CDF).