Homework 1 (1) Suppose (lambda) is a random variable with a 2-parameter Gamma distribution. Given (lambda), Y is a Poisson random variable with mean lambda. What is the un-conditional distribution of Y? Find the mean and Variance of Y. (2) Suppose (pi) is a random variable with a 2-parameter Beta distribution. Given (pi), Y is a Binomial random variable with mean n(pi). where n is known. What is the un-conditional distribution of Y? Find the mean and the variance of Y. (3) In the above 2 step specification of a distribution, the selection of a particular distribution in the first place makes the calculation of un-conditional distribution easier. If we select some other distribution for the (lambda) in (1) or (pi) in (2) then the un-conditional distribution may be hard to write down explicitly. However, show that, no matter what the distribution we pick in the first place, the variance of the un-conditional distribution is always larger than the pure Poisson or Pure binormial. (so over-dispersion is common) Rewrite the following density/pmf into the exponential family form: (1) Normal ( \mu, \sigma^2 ) (2) extreme value distributions with location and scale parameters. and verify the mean and variance formula derived from general exponential family. The above is due no later than March 5. The next problem is due no later than March 9. The next problem refer to the data set: archive.ics.uic.edu/ml/datasets/Credit+Approval The last variable in the data set, A16, with values "+" or "-" is our binary response variable. (a) fit a logistic regression model with 8 or less predictors, using all data cases. (b) set random seed to 12345, in R "set.seed(12345)" and select 450 cases from the data set to be training data, and use the rest as the testing data set, in R use "sample(1:690, ... )" function. Find a "best" predicting model. Using AUC as measure. You may consider the interaction terms, but we shall limit interaction to "two variable interactions" only. If you do not like this data, you may also use the data from Kaggle, www.kaggle.com/c/GiveMeSomeCredit but you need to dreate an account first .... so may not suitable for everyone. Alternative data set: archive.ics.uci.edu/ml/datasets/Hepatitis response variable is the variable with value : DIE/ALIVE