The longer description of each column names:

1. Date.  This is obvious, except not a numeric value

2. DateValue.  This is a numeric value, created using the Date.

3. Course.  The race Course. There are two venues in HongKong and each has several Courses. (long/short, grass/sand  etc.)

4. HorseNo.  The number identify the horse. I would say this is probably not a real factor. But who knows, Chinese are known for
              favoring the number 8 or 6, this will affect the FinalOdds etc. May be color would be also?


5. Rank.  The outcome of each race, rank =1 is the fastest horse. The rank > 5 are not very valuable because there is no money for Rank 4 and above.


6. FinalOdds.  This is published by the racing organization just before the race start. Reflect public opinian.

7. 

   DrawNo.  The starting track no. i.e., inside or outside track for the horse to start. 


14. HWinPer.   Horse Winning percentage. the history of that horse, after normalization.




Let us agree we will judge the quality of models by how you can predict the winner (rank 1) in the 4 month data I hold back:

If the Horse you predicted win, you get $1
If the horse you predicted get rank 2, you get 30c
If the horse you predicted get rank 3, you get 10c
You get nothing, if the horse you predicted get rank 4 or above.













































Jackknife and EL confidence Intervals for the Kaplan-Meier SUrvival probabiity,
A Simulation

Reference: Gaver, and Miller,(1983).
Jackknifng the Kaplan-Meier survival estimator for censored data:
simulation results and asymptotic analysis
Comm. Statist. Theory and Methods 12, 1701-1718

Thomas, D. R. and Grunkemeier, G. L. (1975). Confidence interval estimation
of survival probabilities for censored data.
J. Amer. Statist. Assoc. 70, 865-871



> result6.3 <- matrix(NA, nrow=1000, ncol=5)
> set.seed(123)
> 
> for(i in 1:1000) result6.3[i,]<- SIMU6.2()

> sum(result6.3[,3] > 0.5)
[1] 34
> sum(result6.3[,4] < 0.5)
[1] 20
> sum(result6.3[,5] > 3.84)
[1] 41
> sum(result6.3[,1] > 0.5)
[1] 34
> sum(result6.3[,2] < 0.5)
[1] 23
> mean(result6.3[,2]-result6.3[,1])
[1] 0.3150324
> 
> mean(result6.3[,4]-result6.3[,3])
[1] 0.3246289
> mean(result6.3[,2]-result6.3[,4])
[1] -0.007297861
> mean(result6.3[,1]-result6.3[,3])
[1] 0.002298643
> 
> mean(result6.3[,5])
[1] 0.9009697


> SIMU6.2
function(N=50, maxi=1.5, t = -log(0.5)){
xvec <- rexp(N)
cvec <- runif(N, min=0, max=maxi)
yvec <- pmin(xvec, cvec)
dvec <- as.numeric( xvec <= cvec)

JSvec <- Jpseudo(x=yvec, d=dvec)
est <- theta6.2(x=yvec, d=dvec)
se <- sum((JSvec - mean(JSvec))^2)/(N-1)
se <- sqrt(se)
loo <- est - 1.96*se/sqrt(N)
upp <- est + 1.96*se/sqrt(N)
loo <- (sin(loo))^2
upp <- (sin(upp))^2

LLR <- el.test(x=JSvec, mu=asin(sqrt(0.5)))$"-2LLR"

sfit <- survfit(Surv(yvec, dvec)~1)
temp <- summary( km.ci(survi=sfit, method="grunkemeier") )
indx <- sum(temp$time < t)
lo <- temp$lower[indx]
up <- temp$upper[indx]
return(c(lo,up, loo, upp, LLR) )
}


> Jpseudo
function(x, d){
N <- length(x)
Jps <- rep(NA, N)
for( i in 1:N ) Jps[i] <- theta6.2(x=x[-i], d=d[-i])
return(N*theta6.2(x=x, d=d) - (N-1)*Jps)
}
> 


> theta6.2
function(x, d){
temp <- WKM(x=x,d=d)
indx <- sum(temp$times < log(2))
surv0.5 <- temp$surv[indx]
return( asin(sqrt(surv0.5)) )
}