Empirical distribution is a very important estimator in Statistics. Many statistical procedures depend on its performances. In particular, the popular (nonparametric) Bootstrap method rely heavily on the empirical distribution.

To put it simply, the empirical distribution is a staircase function with the location of the drops randomly placed. Without censoring the size of the drops are always 1/n, with censoring the size of the drops also changes. Where n is the number of random observations this empirical distribution is based on. As n grows the empirical distribution will getting closer to the true distribution (plot in black, the one we used to generate the random observations). This also offers a visial check of how good the java's random number generator is.

This JAVA applet plots empirical distributions computed from a sample of iid exponential random variables. Starting with sample size n=2 and grew to as large as 10,000.

The true distribution, exponential, is ploted in black. Well, actually the survival function, 1-F(t), is plotted. So the black curve is just S(t)= exp(-t).

The empirical distribution is shown in
**red**.

**Notes:**

- The y axis is drawn at 0, the curve starts at 1 and drops down to zero. I am too lazy to put units on the axis. Every time n increases the computation pauses for half a second. (for those want speed instead of detail, I may put up a button for speed adjustment in the future). Also if n exceeds 10000 the computation will be messed up since I have limited the storage to 10000. But after n exceed, say, 2000, the red curve do not change much and almost coincide with the black curve. So there is not much point to go on.
- Now a more theoretical note. Why didn't I try to plot for other distributions? Well, my answer is ...No need to... Because any other distribution when approximated by empiricals, can be thought of as a time change from the exponantial one that we saw. That's right, a gamma distribution (or any other distribution) when approximated by empiricals can be obtained from the exponantial picture above by stratching or squeezing the horizontal axis. i.e. instead of a linear time t, use C(t) at the horizontal...(with C(t) an increasing function). So, it is easy to imaging how this approximation dynamic behaves under other distributions.

I welcome feedback and comments at mai@ms.uky.edu . Copyright © 1998 Mai Zhou. All Rights Reserved. back to my applet index page