Sta635 Final Due Dec. 14, 2016 at 12:00 noon in my office. (earlier submission welcome) Write complete statements in English. In addition to the usual answer of the problem, you need to submit relevant computing codes/outputs too, with key places highlighted. In the discussion/comment when refer to the computing results, please clearly point to the place you read out those relevant information. Problem Refer to the data set "burn" from the R package "KMsurv". Our analysis will focus on the comparison of two bathing solutions as it relate to the time to staphylococcus infection for burn patients. The time to infection or censoring is T3, the indicator of infection is D3. The two treatments, the type of bathing solution for burn patients, is in the indicator Z1. There are many other covariates. See the help page for the data set. 1. first test (use logrank test) the hypothesis that the two treatments makes no difference to the infection times. (no adjustment for covariates). Also plot the two Kaplan-Meier curves, one for each treatment. Does the two curves looks like proportional hazards? (here you may want to refer to midterm exam Q1) 2. fit a Cox model with all time fixed covariates (sex, race, percent of burn area, burn site, type of burn), and see if the bathing solution now makes a difference after adjust for the covariates. 3. Fit a stratified Cox model with above time fixed covariates and stratify on Z1 and plot the two baselines. Comment on the plot, discuss if the proportional hazards assumption on Z1 is appropriate. (here you may want to recall our midterm exam Q1) 4. Introduce a time dependent covariate, "antibody was administered" which is in T2 (time antibody administered), notice D2 is the indicator if an antibody is administered at all. This covariate is time dependent since for some patients, they did not get antibody in earlier times but do at some later times. Is bathing solution still significant? (in this new Cox model) which bathing solution is "better"? (better=longer time till infection) 5. Give residual plots after the model fit in 2. [Please note the function in R package survival for calculate residuals is residuals( coxfit, type="martingale" ) or you can replace martingale to deviance, where coxfit is the output of a coxph( ) fit. For SAS program to calculate residuals see http://www.ms.uky.edu/~mai/splus/SASphreg.txt [use either resdev or resmart ]. We want to plot martingale residuals vs infection time we also want to plot deviance residuals vs infection time. 6. In model we fit in 2, what if we change "percent of burn area" to logit("percent of burn area"), Is the model stay the same? better or worse (use a reasonable measure) than the original model? Notice logit(p) = log ( p/[1-p] )