Sta635 Final  Due Dec. 14, 2016 at 12:00 noon in my office.
(earlier submission welcome)

 Write complete statements in English. 
  In addition to the usual answer of the problem, 
  you need to submit relevant computing codes/outputs too, 
  with key places highlighted. In the discussion/comment
  when refer to the computing results, please clearly point
  to the place you read out those relevant information. 


Problem 

 Refer to the data set "burn" from the R package "KMsurv".

Our analysis will focus on the comparison of two bathing solutions
as it relate to the time to staphylococcus infection for burn patients.

The time to infection or censoring is T3, the indicator of infection is D3.

The two treatments, the type of bathing solution for burn patients, is 
in the indicator Z1.

There are many other covariates. See the help page for the data set. 

1. first test (use logrank test)
   the hypothesis that the two treatments makes no difference
   to the infection times. (no adjustment for covariates).
   Also plot the two Kaplan-Meier curves, one for each treatment. Does the two curves
   looks like proportional hazards? (here you may want to refer to midterm exam Q1)


2. fit a Cox model with all time fixed covariates 
(sex, race, percent of burn area, burn site, type of burn),
 and see if the bathing solution now makes a difference after adjust for the covariates.


3. Fit a stratified Cox model with above time fixed covariates and stratify on Z1
and plot the two baselines. Comment on the plot, discuss if the proportional hazards
assumption on Z1 is appropriate. (here you may want to recall our midterm exam Q1)


4. Introduce a time dependent covariate, "antibody was administered" which is 
in T2 (time antibody administered), notice D2 is the indicator if an antibody 
is administered at all.
This covariate is time dependent since for some patients, they did not get antibody
in earlier times but do at some later times.  
Is bathing solution still significant? (in this new Cox model) 
which bathing solution is "better"? (better=longer time till infection)

5. Give residual plots after the model fit in 2. 

   [Please note the function in R package survival for calculate residuals is  
     residuals(  coxfit, type="martingale" ) or you can replace martingale to deviance, where
   coxfit is the output of a coxph( ) fit. For SAS program to calculate residuals see 
    http://www.ms.uky.edu/~mai/splus/SASphreg.txt [use either resdev or resmart ].


        We want to plot martingale residuals vs infection time
        we also want to plot deviance residuals vs infection time.

6. In model we fit in 2, what if we change "percent of burn area" to logit("percent of burn area"), 
   Is the model stay the same? better or worse (use a reasonable measure) than the original model?


Notice logit(p) = log ( p/[1-p] )