Learn Counting Process for Survival Analysis in 25 Minutes!

By Mai Zhou

First some clarification: we do not learn Survival Analysis here, we only learn the counting processes used in the survival analysis (and avoiding many technicalities). We do not talk about the central limit theorem related to counting processes. And we assume familiarity of Poisson Process.

The best books covering these topics rigorously plus many applications are Counting Processes and Survival Analysis by Fleming and Harrington (1991) and Statistical Models Based on Counting Processes by Andersen, Borgan, Gill and Keiding (1993).
But both books contain more materials then can be covered in one semester. The notations of the second book are complicated. The materials in both book can be intimidating for those do not have a strong math background or do not have a lot of time.

We give you some basic understanding of the counting process here. More importantly, we let you play! (play the applet) and build intuition. This is not intended as a replacement of the rigorous mathematical treatment of the subject.
Assumption: You know some basic probability theory (random variables, common distributions like exponential, their transformations, etc) You are familiar with Poisson processes and its properties. If you know nonhomogeneous Poisson process (i.e. nonstationary) then it is better. If you know compound Poisson process, that's even better. (but not required.)

Ready? Lets begin!

Minutes 1-5: Review of Poisson process and its properties. Notation: we will denote a Poisson process as P(t) instead, reserve the notation N(t) for the general counting process.

Poisson process P(t). [P(0) == 0] For any fixed time t, P(t) is a Poisson (lambda t) random variable. (represent the number of hits accumulated from 0 to t).

For a fixed omega, when t varies, P(t, omega), i.e. the sample path, is increasing, piecewise constant, with jumps of size one. The waiting time between consecutive jumps are iid exponential (lambda) random variables.

Independent increnements. (waiting times are independent randm variables). See the applet. Poisson process applet.

Stationary. The waiting time is always exp(lambda).

lambda is called the intensity, lambda t = int_0^t lambda ds is called the cumulative intensity. Constant intensity is a defining charactistic of a Poisson process.

M(t) = P(t) - lambda t is a continuous time martingale. M^2(t) - lambda t is also a martingale.

Intuition: think of P(t) as the number of rain drops hitting your head as a function of time. (assume the storm has constant intensity). You may also think of P(t) as the number of goals as a function of time t in a soccer game (for 0 <= t <= 90 min).

Minutes 6-10: Our first generalization to Poisson process is to allow time-change (acceleration/deccelaration of clock). time-change Poisson

This will make the waiting time between two consecutive jumps no longer exponential (unless the transformation is c*t ). We get N(t) = P( g(t) ), where g(t) is an increasing function representing the cumulative flow of time. (for example if g(t) = 2t then we are using a clock running twice as fast, and the resulting P( g(t) ) is (still) a Poisson process but with intensity 2 * lambda. etc.) Think of this as the fast-forward/slow-motion/pause button on your VCR. I called it a crazy clock in the paper about the Cox model.

The derivative g'(t) is the rate/speed of the clock at time t. See (and play) the Applet. You are allowed to change the rate g'(t)=intensity at time t. This is similar to nonhomogeneous Poisson process except we let you change g'(t) as you go, not neccessary according to a pre-determined pattern. i.e. g'(t) can depende on history at time t. e.g. g'(t) = 1/k where k is the number of hits so far. (this is predictable).

Martingale: We still have (assume P(t) is a standard Poisson process) P( g(t) ) - g(t) a martingale, assume g(t) do not depend on future information at time t.

Minutes 11-15: Integration: This will make the size of the jumps no longer always equal to one but equal to f(t_i) for the ith jump, (where t_i is the time of the ith jump). i.e. we get to change jump sizes). Stochastic integration

Mathematically this is

Notice the Poisson process can be think of as (no time change, and always have jump size 1)

See (and play) the Applet. You can change the f(t) value. (could even be negative, could depend on history).

Example: Same as the Poisson process except the jump size is growing with time: jump at time t has size t.

Then N(t) = \int_0^t s d P(s)

Example: we want a poisson process but the jumps sizes are successively smaller, equal to 1/(1+k) for k+1th jump. Then N(t) = \int_0^t 1/(1+P(s-)) d P(s)

Example: we want to count when a positive random variable X occur and with a jump size equal to the time of the jump [if it occurs later, its jump size will be larger]
then N(t) = \int_0^t s I[X >= s] d I[X <= s]
where the indicator I[X >= s] is needed since after the X occurs (once), it cannot occur again. This indicator stops the integration. This is similar (but not exactly the same) to the compound Poisson Process. In a compound Poisson process, the jump sizes are determined by Y_i, a sequence of independent random variables. (can you write an integral similar to above to represent a compound Poisson Process?)

Minutes 16-20: Allow both of the above two changes (generalizations), at time t, to depend on the history (at time t) and other outside information but not the future of N(t). (This allows the modeling of censoring, truncation of the data. For example if a potential death got censored, then it is like we stop the clock there.) counting process

Well, you already did use history if you played the two applets above, since you can change the value of g'(t) and f(t) with the full knowledge of what have already happened to N( ), g'( ), and f( ) up to the time t.

Not allowing the change to depend on the future (at any moment) would still make it a fair game -- martingale by subtract the intensity. If it is violated then strange thing can happen. It is easily seen that if a person always stop the clock one second before the first jump then all sorts of equality broke.

Minutes 21-25: Cumulative jumps (up to time t) minus the cumulative intensity (up to time t) is a martingale.

Just like Poisson process minus lambda t : M(t) = P(t) - lambda t,

N(t) - A(t) = M(t) is a martingale! where

N(t) = \int_0 ^t f(s) d N(g(s)) and A(t) = \int_0^t f(s) d g(s)

Minute 26-30: Martingale representation of the Kaplan-Meier and Nelson Aalen estimator. Oops, this is beyond the 25 min. promised, so see my longer notes for that.

References:

In addition to the two books mentioned above The Chapter 5 of Kalbfleisch and Prentice (2002) book, 2nd edition, is also good.

Question: How to tune the clock speed so that the waiting time for the (first and only) jump be distributed same as X -- a given positive random variable? Theorem for a (one jump) counting process I[ X <= t ] the waiting time for the first (and only) jump is a random variable with distribution F_x.

A: If we tune the clock rate/speed according to h(t) [ the hazard function of F] then the waiting time distribution is F_x.

Conclusion: we may view the (one jump) counting process I[ X <= t ] as N( g(t) ) with g(t) = H(t) but stopped at the jump. i.e. I[ X <= t ] - \int_0^t I[X>=s] dH(s) = N(g(t)) - g(t) is a martingale.

Martingale representation of the Kaplan-Meier estimator.