Homework 6. Due Nov. 28, 2016 1. The Recidivism data set from our textbook (which you can download) has 432 cases. They are collected from 432 male inmates from Maryland prisons. Those inmates are released from prison and followed for one year after release. The variable "arrest" is 0 if the person stay out of prison for 52 weeks (one year) and the "week" is 52. The variable "arrest" is 1 if the person arrested again during 52 week after release. "Week" is the number of weeks he stays trouble free until arrest. "fin" has value 1 if the inmate receive finantial aid after release. "age" self explainatory. (at the time of release) "race" has value 1 if the person is black. 0 otherwise. "wexp" has value 1 in the person has work experience before go to prison. "mar" has value 1 if the person is married (at the time of release) "paro" has value 1 if the oerson is released on parole. "prio" is the number of prior convictions an inmate had. "educ" is the education level of the person. (the higher the number, more years in school) There is at least one case that got arrested at 52 week. So, we shall change those cases that has week=52 and arrest =0, to week = 52.5. Notice here "educ" is a class variable. (there are many others that are also class variable, but since there are only two levels, we do not need to worry) (whenever fit Cox model use efron tie-handling) First, fit a Cox phreg model with "week" as the survival time, and "arrest" as censoring status. Use one covariate at a time. (so fit 8 cox models). If there is/are a Cox model with p-value for the covariate larger than 0.4, then we shall no longer consider this covariate in the future. 2. With those covariate (that the single covariate cox model result a p-value smaller than 0.4), fit a multiple Cox model. Find the most significant covariate, and explain in English, use the hazard ratio, how the change of this covariate benefit/harm the survival time. 3. Use the model obtained in 2, Compute and Plot the predicted survival curve for a hypothetical person with (fin=1,age=26,race=1,wexp=0,mar=0,paro=0,prio=1,educ=4) Also, find the median of this curve, if there is. (since the curve only extend to time=52 weeks, and the survival probability at 52 week may be > 0.5, then median is not defined.) In that case find the 25th percentile. 4. Repeat the above analysis, using the Weibull model instead of Cox proportional hazard model. Comment on the similarities and differences of the two models (coxph and Weibull), and two data analysis outcomes. ==================================================================================== ? ? ? =========================================================================================