Some note on the Excel data File "Zhou Mai" 1. We shall be interested in modeling the Rank (as Y or responses), as a function of other covariates. But we shall use only Ranks of 1, 2, 3, 4, anything 5 or above should be just treated as 5+. (i.e. right censored) 2. Do not use WtCarried. 3. "FinalOdds" should always in the model. 4. We are not exactly sure how to handle "Course". There are two approaches: First, one may build a model for each different Course. (7 models) but Turf "A" and Turf "A+2" may be similar and can be combined. The disadvantage is we will have fewer data for each model. Secondly, we may use "Course" as a class variable and so other covariates will be the same for all 7 race courses. 5. Interactions of order 2 will be considered. But not anything higher than order two. 6. Use the Date and Raceno as "strata" in fitting the Cox model (Bradley-Terry model). i.e. data in the same Date, same Raceno form a strata. 7. Use data from beginning (2000-12-13) to (2002-06-16) as the trainning data to fit the models. Use the rest (2002-09-04) to end as the test data. 8. As a specific case, please use the model you got from training data to predict the probability of outcome of "WIN" and "SHOW" for the day 2002-10-01. List the predicted outcome and the actual outcome. Also find the probability of the predicted outcome and probability of the actual outcome. 9. Fix a strategy of betting: specify when you will bet and bet how much. Make your decision based on the predicted probabilities from your models and the FinalOdds. Since we only have the FinalOdds for the "WIN" so your choices are a). chose which horse(s) or none to bet "WIN" and b). bet how much? Stick to your betting strategy through the test data and see the result. Begin with $10,000 and see how much you end up with. (if you win you can calculate the payoff from the finalOdds: if you bet $3 on horse j to win and this horse actually win, then your payoff is 3F+3 ,where F is the FinalOdds) ================================================================ The profitability (expected value of a bet) should be calculated like: Suppose the model predicts for a particular horse P(win) = p and the FinalOdds is F, then Expected profit for $3 bet = p*(3*F+3) - (1-p)*3 Notice that often the Odds are expressed as 3-2, what we are recorded in the data set have been normalized to X-1, so 3-2 becomes 1.5-1, and in the data, just 1.5 etc. Some suggestions: (1) Split the data into a training data and a test data as per 7. above. (2) Add a column to the test data set, which will be the predicted probability of "WIN" for this horse, based on the training data. Notice, these probabilities should sum to one within each "strata" (Date and Raceno), since someone is got to WIN in a race. (3) Add another column, which are the betting $ your strategy imply. You need also describe in English what is your betting strategy (the rule). (your strategy cannot depend on Rank.) (4) Add yet another column, which are the payoff for your bet, determined by the Rank. (if this horse has Rank=1, then your payoff is (F+1)*your bet, otherwise your payoff is - your bet) (5) Plot the cumulative sum of your payoff against time, from 2002-09-04 to the end.