The Poisson Distribution

Recall that the Poisson process models random points in the time set \([0, \infty)\) subject to a strong renewal property that states that at each arrival time and at each fixed time, the process restarts independently of the past. Here is a review of what we have so far:

The inter-arrival times \(\bs{X} = (X_1, X_2, \ldots)\) are independent and each has the exponential distribution with rate parameter \(r \), for some \( r \in (0, \infty) \). The common probability density function \(f\) is given by \[f(t) = r e^{- r t}, \quad t \in [0, \infty)\]

The sequence of arrival times \(\bs{T} = (T_0, T_1, T_2, \ldots)\) is the partial sum process associated with \(\bs{X}\): \[ T_n = \sum_{i = 1}^n X_i, \quad n \in \N \] and hence the sequence has stationary, independent increments. For \(n \in \N_+\), \(T_n\) has the gamma distribution with rate parameter \(r\) and shape parameter \(n\) and has density function \(f_n\) given by \[ f_n(t) = r^n \frac{t^{n-1}}{(n-1)!} e^{-r t}, \quad t \in [0, \infty) \]

Details:

\(T_0 = 0\) is not considered an arrival, but this convention makes certain formulas nicer.

Exercise and each characterize the Poisson process with rate \( r \). But in this section we are interested in the associated counting process.

For \(t \in [0, \infty)\) let \(N_t\) denotes the number of arrivals in the interval \((0, t]\), so that \( N_t = \max\{n \in \N: T_n \le t\} \). The process \(\bs{N} = (N_t: t \in [0, \infty))\) is the counting process and has an inverse relationship with \(\bs T\) in the sense that \(N_t \ge n\) if and only if \(T_n \le t\) for \(n \in \N_+\) and \(t \in (0, \infty)\).

Details:

The event \(N_t \ge 0\) and the event \(T_n \le t\) both mean that there are at least \(n\) arrivals in \((0, t]\).

We will show that \( N_t \) has a Poisson distribution, named for Simeon Poisson, one of the most important distributions in probability theory. Our exposition will alternate between properties of the distribution and properties of the counting process. The two are intimately intertwined. It's not too much of an exaggeration to say that wherever there is a Poisson distribution, there is a Poisson process lurking in the background.

Probability density function.

For \( t \in [0, \infty) \), the probability density function of \(N_t\) is given by

\[ \P(N_t = n) = e^{-r t} \frac{(r t)^n}{n!}, \quad n \in \N \]

Details:

Using and , and integration by parts, we have \[ \P(N_t \ge n) = \P(T_n \le t) = \int_0^t f_n(s) \, ds = 1 - \sum_{k=0}^{n-1} e^{-r t} \frac{(r t)^k}{k!}, \quad n \in \N \] For \(n \in \N\) we have \(\P(N_t = n) = \P(N_t \ge n) - \P(N_t \ge n + 1)\). Simplifying gives the result.

Note that the distribution of \(N_t\) depends on the paramters \(r\) and \(t\) only through the product \(r t\). The distribution is called the Poisson distribution with parameter \(r t\).

In the Poisson experiment, vary \(r\) and \(t\) with the scrollbars and note the shape of the probability density function. For various values of \(r\) and \(t\), run the experiment 1000 times and compare the relative frequency function to the probability density function.

In general, a random variable \(N\) with values in \(\N\) has the Poisson distribution with parameter \(a \in (0, \infty)\) if it has the probability density function \[ g(n) = e^{-a} \frac{a^n}{n!}, \quad n \in \N \]

\(g(n - 1) \lt g(n)\) if and only if \(n \lt a\).
If \(a \notin \N_+\), there is a single mode at \(\lfloor a \rfloor\).
If \(a \in \N_+\), there are consecutive modes at \(a - 1\) and \(a\).

Details:

Part (a) follows from simple algebra, and similarly, \( g(n - 1) = g(n) \) if and only if \( n = a \) (and thus \( a \in \N_+ \)). Parts (b) and (c) then follow.

The Poisson distribution does not have simple closed-form distribution function or quantile function. Trivially, we can write the distribution function as a sum of the probability density function.

The Poisson distribution with parameter \( a \) has distribution function \( G \) given by \[ G(n) = \sum_{k=0}^n e^{-a} \frac{a^k}{k!}, \quad n \in \N \]

Open the quantile app, select the Poisson distribution, and select CDF view. Vary the parameter and note the shape of the distribution and quantile functions. For various values of the parameter, compute the quartiles.

Sometimes it's convenient to allow the parameter \( a \) to be 0. This degenerate Poisson distribution is simply point mass at 0. That is, with the usual conventions regarding nonnegative integer powers of 0, the probability density function \( g \) above reduces to \( g(0) = 1 \) and \( g(n) = 0 \) for \( n \in \N_+ \).

Moments

Suppose that \(N\) has the Poisson distribution with parameter \(a \gt 0\). Naturally we want to know the mean, variance, skewness and kurtosis, and the probability generating function of \(N\). The easiest moments to compute are the factorial moments. For this result, recall the falling power notation for the number of permutations of size \(k\) chosen from a population of size \(n\): \[ n^{(k)} = n (n - 1) \cdots [n - (k + 1)] \]

The factorial moment of \(N\) of order \(k \in \N\) is \(\E\left[N^{(k)}\right] = a^k\).

Details:

Using the standard change of variables formula for expected value, \[ \E[N^{(k)}] = \sum_{n=0}^\infty n^{(k)} e^{-a} \frac{a^n}{n!} = e^{-a} a^k \sum_{n=k}^\infty \frac{a^{n-k}}{(n-k)!} = e^{-a} a^k e^a = a^k \]

The mean and variance of \(N\) are the parameter \(a\).

\(\E(N) = a\)
\(\var(N) = a\)

Details:

The results follow from .

\(\E(N) = \E\left[N^{(1)}\right] = a\).
\(\E(N^2) = \E\left[N^{(2)}\right] + \E(N) = a^2 + a\).

Open the special distribution simulator and select the Poisson distribution. Vary the parameter and note the location and size of the mean\( \pm \)standard deviation bar. For selected values of the parameter, run the simulation 1000 times and compare the empirical mean and standard deviation to the distribution mean and standard deviation.

The skewness and kurtosis of \( N \) are

\( \skw(N) = 1 / \sqrt{a} \)
\( \kur(N) = 3 - 5 / a \)

Details:

These results follow from the computational formulas for skewness and kurtosis and . Specifically,

\( E(N^3) = a^3 + 3 a^2 + a \) and \( \skw(N) = [\E(N^3) - 3 a^2 - a^3] / a^{3/2} \)
\( \E(N^4) = a^4 + 6 a^3 + 7 a^2 - 5 a \) and \(\kur(N) = [\E(N^4) - 4 a \E(N^3) + 6 a^3 + 3 a^4] / a^2 \)

Note that the Poisson distribution is positively skewed, but \( \skw(N) \to 0 \) as \( a \to \infty \). Recall also that the excess kurtosis is \( \kur(N) - 3 = -5 / a \to 0 \) as \( a \to \infty \). This limit is related to the convergence of the Poisson distribution to the normal, discussed below in .

Open the special distribution simulator and select the Poisson distribution. Vary the parameter and note the shape of the probability density function in the context of the results on skewness and kurtosis above.

The probability generating function \( P \) of \(N\) is given by \[ P(s) = \E\left(s^N\right) = e^{a(s - 1)}, \quad s \in \R \]

Details:

Using the change of variables formula again, \[ \E(s^N) = \sum_{n=0}^\infty s^n e^{-a} \frac{a^n}{n!} = e^{-a} \sum_{n=0}^\infty \frac{(a s)^n}{n!} = e^{-a} e^{a s}, \quad s \in \R \]

Returning to the Poisson counting process \(\bs{N} = (N_t: t \ge 0)\) with rate parameter \(r\), it follows that \(\E(N_t) = r t\) and \(\var(N_t) = r t\) for \(t \ge 0\). Once again, we see that \(r\) can be interpreted as the average arrival rate. In an interval of length \(t\), we expect about \(r t\) arrivals.

In the Poisson experiment, vary \(r\) and \(t\) with the scrollbars and note the location and size of the mean\( \pm \)standard deviation bar. For various values of \(r\) and \(t\), run the experiment 1000 times and compare the sample mean and standard deviation to the distribution mean and standard deviation, respectively.

Estimating the Rate

Suppose again that we have a Poisson process with rate \( r \in (0, \infty) \). In many practical situations, the rate \(r\) in unknown and must be estimated based on observing data. For fixed \(t \gt 0\), a natural estimator of the rate \(r\) is \(N_t / t\).

The mean and variance of \(N_t / t\) are

\(\E(N_t / t) = r\)
\(\var(N_t / t) = r / t\)

Details:

These result follow easily from \(\E(N_t) = \var(N_t) = r t\) and basic properties of expected value and variance.

Part (a) means that the estimator is unbiased. Since this is the case, the variance in part (b) gives the mean square error. Since \(\var(N_t)\) decreases to 0 as \(t \to \infty\), the estimator is consistent.

Additional Properties

Increments and Characterizations

Let's explore the basic renewal assumption of the Poisson model in terms of the counting process \(\bs{N} = (N_t: t \ge 0)\). Recall that \(N_t\) is the number of arrivals in the interval \((0, t]\), so it follows that if \(s, \, t \in [0, \infty)\) with \(s \lt t\), then \(N_t - N_s\) is the number of arrivals in the interval \((s, t]\). Of course, the arrival times have continuous distributions, so the probability that an arrival occurs at a specific point \(t\) is 0. Thus, it does not really matter if we write the interval above as \((s, t]\), \((s, t)\), \([s, t)\) or \([s, t]\).

The process \(\bs{N}\) has stationary, independent increments.

If \(s, \, t \in [0, \infty)\) with \(s \lt t\) then \(N_t - N_s\) has the same distribution as \(N_{t-s}\), namely Poisson with parameter \(r (t - s)\).
If \(t_1, \, t_2, \, t_3 \ldots \in [0, \infty)\) with \(t_1 \lt t_2 \lt t_3 \lt \cdots\) then \(\left(N_{t_1}, N_{t_2} - N_{t_1}, N_{t_3} - N_{t_2}, \ldots\right)\) is an independent sequence.

Statements about the increments of the counting process can be expressed more elegantly in terms of the more general counting process. Recall first that \([0, \infty)\) is given the standard \(\sigma\)-algebra of Borel measurable subsets, and the standard (Lebesgue) measure \(\lambda\) that generalizes length.

For measurable \(A \subseteq [0, \infty)\), let \(N(A)\) denote the number of random points in \(A\): \[ N(A) = \#\left\{n \in \N_+: T_n \in A\right\} \]

So in particular, \( N_t = N(0, t] \). Thus, note that \( t \mapsto N_t \) is a (random) distribution function and \( A \mapsto N(A) \) is the (random) measure associated with this distribution function. Here is our third characterization of the Poisson process.

A random process of points in \([0, \infty)\) is a Poisson process with rate \( r \in (0, \infty) \) if and only if the following properties hold:.

If \( A \subseteq [0, \infty) \) is measurable then \( N(A) \) has the Poisson distribution with parameter \( r \lambda(A) \).
if \( \{A_i: i \in I\} \) is a countable, disjoint collection of measurable sets in \( [0, \infty) \) then \( \{N(A_i): i \in I\} \) is a set of independent variables.

From a modeling point of view, the assumptions of stationary, independent increments are ones that might be reasonably made. But the assumption that the increments have Poisson distributions does not seem as clear. Our next characterization of the process is more primitive in a sense, because it just imposes some limiting assumptions (in addition to stationary, independent increments.

A random process of points in \([0, \infty)\) is a Poisson process with rate \( r \in (0, \infty) \) if and only if the following properties hold:

If \( A, \, B \subseteq [0, \infty) \) are measurable and \(\lambda(A) = \lambda(B)\), then \(N(A)\) and \(N(B)\) have the same distribution.
if \( \{A_i: i \in I\} \) is a countable, disjoint collection of measurable sets in \( [0, \infty) \) then \( \{N(A_i): i \in I\} \) is a set of independent variables.
If \( A_n \subseteq [0, \infty) \) is measurable and \(\lambda(A_n) \gt 0\) for \(n \in \N_+\), and if \(\lambda(A_n) \to 0\) as \(n \to \infty\) then \begin{align} &\frac{\P\left[N(A_n) = 1\right]}{\lambda(A_n)} \to r \text{ as } n \to \infty \\ &\frac{\P\left[N(A_n) \gt 1\right]}{\lambda(A_n)} \to 0 \text{ as } n \to \infty \end{align}

Details:

As usual, let \(N_t = N(0, t]\), the number of arrivals in \( (0, t] \), and in addition let \(P_n(t) = \P(N_t = n)\) for \(t \ge 0\) and \(n \in \N\). Note first that \(P_0\) satisfies the following differential equation and initial condition: \[ P_0^\prime(t) = -r P_0(t), \; t \gt 0; \quad P_0(0) = 1 \] Hence \(P_0(t) = e^{-r t}\) for \(t \ge 0\). Next for \(n \in \N_+\), \(P_n\) satisfies the following differential equation and initial condition \[P_n^\prime(t) = -r P_n(t) + r P_{n-1}(t), \; t \gt 0; \quad P_n(0) = 0 \] Hence \(P_n(t) = e^{-r t} (r t)^n / n!\) for \(t \ge 0\) and therefore \(N_t\) has the Poisson distribution with parameter \(r t\).

Of course, part (a) is the stationary assumption and part (b) the independence assumption. The first limit in (c) is sometimes called the rate property and the second limit the sparseness property. In a small time interval of length \( dt \), the probability of a single random point is approximately \( r \, dt \), and the probability of two or more random points is negligible.

Sums

The Poisson distribution is closed with respect to sums of independent variables.

Suppose that \(N\) and \(M\) are independent random variables, and that \(N\) has the Poisson distribution with parameter \(a \in (0, \infty)\) and \(M\) has the Poisson distribution with parameter \(b \in (0, \infty)\). Then \(N + M\) has the Poisson distribution with parameter \(a + b\).

Details:

There are several ways to prove this result, but the one that gives the most insight is a probabilistic proof based on the Poisson process. Thus suppose that \(\bs{N} = (N_t: t \ge 0)\) is a Poisson counting process with rate 1. We can associate \(N\) with \(N_a\) and \(M\) with \(N_{a + b} - N_a\), since these have the correct distributions and are independent. But then \(N + M\) is \(N_{a + b}\).

Our next proof uses probability generating functions. From , \( M \) has PGF \( P(s) = e^{a (s - 1)} \) for \( s \in \R \), and \( N \) has PGF \( Q(s) = e^{b(s-1)} \) for \( s \in \R \). Hence \( M + N \) has PGF \( P(s) Q(s) = e^{(a+b)(s-1)} \) for \( s \in \R \). But this is the PGF of the Poisson distribution with parameter \( a + b \).

Our third proof uses probability density functions. From exericse , \( M \) has PDF \( g(n) = e^{-a} a^n / n! \) for \( n \in \N \), and \( N \) has PDF \( h(n) = e^{-b} b^n / n! \) for \( n \in \N \). Hence the PDF of \( M + N \) is the convolution \( g * h \). For \( n \in \N \), \[ (g * h)(n) = \sum_{k=0}^n g(k) h(n - k) = \sum_{k=0}^n e^{-a} \frac{a^k}{k!} e^{-b} \frac{b^{n-k}}{(n - k)!} = e^{-(a+b)} \frac{1}{n!} \sum_{k=0}^n \frac{n!}{k! (n - k)!} a^k b^{n-k}\] By the binomial theorem, the last sum is \( (a + b)^n \).

From the last theorem, it follows that the Poisson distribution is infinitely divisible. That is, a Poisson distributed variable can be written as the sum of an arbitrary number of independent, identically distributed (in fact also Poisson) variables.

Suppose that \( N \) has the Poisson distribution with parameter \( a \in (0, \infty) \). Then for \( n \in \N_+ \), \( N \) has the same distribution as \( \sum_{i=1}^n N_i \) where \( (N_1, N_2, \ldots, N_n) \) are independent, and each has the Poisson distribution with parameter \( a / n \).

Normal Approximation

Because of the representation as a sum of independent, identically distributed variables, it's not surprising that the Poisson distribution can be approximated by the normal.

Suppose that \(N_t\) has the Poisson distribution with parameter \(t \gt 0\). Then the distribution of the variable below converges to the standard normal distribution as \(t \to \infty\). \[ Z_t = \frac{N_t - t}{\sqrt{t}} \]

Details:

As usual, we can assume that \((N_t: t \ge 0)\) is the Poisson counting process with rate 1. Note that \(Z_t\) is simply the standard score associated with \(N_t\). For \(n \in \N_+\), \(N_n\) is the sum of \(n\) independent variables, each with the Poisson distribution with parameter 1. Thus, from the central limit theorem, the distribution of \(Z_n\) converges to the standard normal distribution as \(n \to \infty\). For general \(t \in [0, \infty)\), it's possible to write \(Z_t = Z_n + W_t\) where \(n = \lfloor t \rfloor\) and \(W_t \to 0\) as \(t \to \infty\) (in probability and hence in distribution).

So if \(N\) has the Poisson distribution with parameter \(a \in (0, \infty)\), and \(a\) is large, then the distribution of \(N\) is approximately normal with mean \(a\) and standard deviation \(\sqrt{a}\). When using the normal approximation, we should remember to use the continuity correction, since the Poisson is a discrete distribution.

In the Poisson experiment, set \(r = t = 1\). Increase \(t\) and note how the graph of the probability density function becomes more bell-shaped.

General Exponential

The Poisson distribution is a member of the general exponential family of distributions. This fact is important in various statistical procedures.

Suppose that \(N\) has the Poisson distribution with parameter \(a \in (0, \infty)\). This distribution is a one-parameter exponential family with natural parameter \(\ln(a)\) and natural statistic \(N\).

Details:

This follows from the form of the Poisson PDF in \[ g(n) = e^{-a} \frac{a^n}{n!} = \frac{e^{-a}}{n!} \exp\left[n \ln (a)\right], \quad n \in \N \]

The Uniform Distribution

The Poisson process has some basic connections to the uniform distribution. Consider again the Poisson process with rate \(r \gt 0\). As usual, \(\bs{T} = (T_0, T_1, \ldots)\) denotes the arrival time sequence and \(\bs{N} = (N_t: t \ge 0)\) the counting process.

For \(t \gt 0\), the conditional distribution of \(T_1\) given \(N_t = 1\) is uniform on the interval \((0, t]\).

Details:

Given \(N_t = 1\) (one arrival in \((0, t]\)) the arrival time \(T_1\) takes values in \((0, t]\). From independent and stationary increments properties, \[ \P(T_1 \le s \mid N_t = 1) = \P(N_s = 1, N_t - N_s = 0 \mid N_t = 1) = \frac{\P(N_s = 1, N_t - N_s = 0)}{\P(N_t = 1)} = \frac{\P(N_s = 1) \P(N_t - N_s = 0)}{\P(N_t = 1)}\] Hence using the Poisson distribution, \[ \P(T_1 \le s \mid N_t = 1) = \frac{e^{-r\,s} s e^{-r(t - s)}}{e^{-r\,t} t} = \frac{s}{t}, \quad 0 \lt s \le t \]

More generally, for \(t \gt 0\) and \(n \in \N_+\), the conditional distribution of \((T_1, T_2, \ldots, T_n)\) given \(N_t = n\) is the same as the distribution of the order statistics of a random sample of size \(n\) from the uniform distribution on the interval \((0, t]\).

Details:

Suppose that \( 0 \lt t_1 \lt t_2 \lt t_n \lt t \). On the event \( N_t = n \), the probability density of \( (T_1, T_2, \ldots, T_n) \) at \( (t_1, t_2, \ldots, t_n) \) is the probability density function of independent inter-arrival times \( \left(t_1, t_2 - t_1, \ldots, t_n - t_{n-1}\right) \) times the probability of no arrivals in the interval \((t_n, t) \). Hence given \( N_t = n \), the conditional density of \( (T_1, T_2, \ldots, T_n) \) at \( (t_1, t_2, \ldots, t_n) \) is \[ \frac{r e^{- r t_1} r e^{-r (t_2 - t_1)} \cdots r e^{-r(t_n - t_{n-1})} e^{-r(t - t_n)}}{e^{-r t} (r t)^n \big/n!} = \frac{n!}{t^n} \] But this is the PDF of the order statistics from a sample of size \( n \) from the uniform distribution on \( [0, t] \).

Note that the conditional distribution in is independent of the rate \(r\). This means that, in a sense, the Poisson model gives the most random distribution of points in time.

The Binomial Distribution

The Poisson distribution has important connections to the binomial distribution. First we consider a conditional distribution based on the number of arrivals of a Poisson process in a given interval, as we did in the last subsection.

Suppose that \( (N_t: t \in [0, \infty)) \) is a Poisson counting process with rate \( r \in (0, \infty) \). If \( s, \, t \in (0, \infty) \) with \( s \lt t \), and \(n \in \N_+\), then the conditional distribution of \(N_s\) given \(N_t = n\) is binomial with trial parameter \(n\) and success parameter \(p = s / t\).

Details:

Note that given \(N_t = n\), the number of arrivals \(N_s\) in \((0, s]\) takes values in \(\{0, 1 \ldots, n\}\). Again, the stationary and independent increments properties are critical for the proof. \[ \P(N_s = k \mid N_t = n) = \frac{\P(N_s = k, N_t = n)}{\P(N_t = n)} = \frac{\P(N_s = k, N_t - N_s = n - k)}{\P(N_t = n)} = \frac{\P(N_s = k) \P(N_t - N_s = n - k)}{\P(N_t = n)} \] Subsitituting into the Poisson PDFs gives \[ \P(N_s = k \mid N_t = n) = \frac{\left(e^{-r s}(r s)^k / k!\right)\left(e^{-r(t-s)}[(r(t - s)]^{n-k}/(n - k)!\right)}{e^{-r t}(r t)^n / n!} = \frac{n!}{k! (n - k)!} \left(\frac{s}{t}\right)^k \left(1 - \frac{s}{t}\right)^{n-k}\]

Note again that the conditional distribution in does not depend on the rate \(r\). Given \( N_t = n \), each of the \( n \) arrivals, independently of the others, falls into the interval \( (0, s] \) with probability \( s / t \) and into the interval \( (s, t] \) with probability \( 1 - s / t = (t - s) / t \). Here is essentially the same result, outside of the context of the Poisson process.

Suppose that \( M \) has the Poisson distribution with parameter \( a \in (0, \infty) \), \( N \) has the Poisson distribution with parameter \( b \in (0, \infty) \), and that \( M \) and \( N \) are independent. Then the conditional distribution of \( M \) given \( M + N = n \) is binomial with parameters \( n \) and \( p = a / (a + b) \).

Details:

The proof is essentially the same as in , with minor modifications. First recall from that \( M + N \) has the Poisson distribution with parameter \( a + b \). For \( n, \, k \in \N \) with \( k \le n \), \[ \P(M = k \mid M + N = n) = \frac{\P(M = k, M + N = n)}{\P(M + N = n)} = \frac{\P(M = k, N = n - k)}{\P(N + M = n)} = \frac{\P(M = k) \P(N = n - k)}{\P(M + N = n)} \] Subsitituting into the Poisson PDFs gives \[ \P(M = k \mid M + N = n) = \frac{\left(e^{-a} a^k / k!\right)\left(e^{-b} b^{n-k}/(n - k)!\right)}{e^{-(a + b)}(a + b)^n / n!} = \frac{n!}{k! (n - k)!} \left(\frac{a}{a + b}\right)^k \left(1 - \frac{a}{a + b}\right)^{n-k}\]

More importantly, the Poisson distribution is the limit of the binomial distribution in a certain sense. As we will see, this convergence result is related to the analogy between the Bernoulli trials process and the Poisson process that we discussed in the introduction, the section on the exponential distribution, and the section on the gamma distribution.

Suppose that \( p_n \in (0, 1) \) for \( n \in \N_+ \) and that \( n p_n \to a \in (0, \infty) \) as \( n \to \infty \). Then the binomial distribution with parameters \(n\) and \(p_n\) converges to the Poisson distribution with parameter \(a\) as \(n \to \infty\). That is, for fixed \(k \in \N\), \[ \binom{n}{k} p_n^k (1 - p_n)^{n-k} \to e^{-a} \frac{a^k}{k!} \text{ as } n \to \infty \]

Details:

The binomial PDF with parameters \( n \) and \( p_n \) at \( k \in \{0, 1, \ldots, n\} \) can be written as \[ \frac{1}{k!}\left[n p_n\right]\left[(n - 1) p_n\right] \cdots \left[(n - k + 1) p_n\right] \left(1 - n \frac{p_n}{n}\right)^{n-k} \] But \((n - j) p_n \to a\) as \(n \to \infty\) for fixed \(j\). Also, using a basic theorem from calculus, \(\left(1 - n p_n / n\right)^{n-k} \to e^{-a}\) as \(n \to \infty\).

An easier proof uses probability generating functions. For \(s \in \R\), using the same basic limit from calculus, \[ \left[(1 - p_n) + p_n s\right]^n \to e^{a(s - 1)} \text{ as } n \to \infty \] The left side is the PGF of the binomial distribution with parameters \( n \) and \( p_n \), while the right side is the PGF of the Poisson distribution with parameter \( a \).

In the context of , the mean and variance of the binomial distribution converge to the mean and variance of the limiting Poisson distribution, respectively.

\(n p_n \to a\) as \(n \to \infty\)
\(n p_n (1 - p_n) \to a\) as \(n \to \infty\)

Of course the convergence of the means is precisely the basic assumption in , and is further evidence that this is the essential assumption. But for a deeper look, let's return to the analogy between the Bernoulli trials process and the Poisson process. Recall that both have the strong renewal property that at each fixed time, and at each arrival time, the process stochastically starts over, independently of the past. The difference, of course, is that time is discrete in the Bernoulli trials process and continuous in the Poisson process. The convergence result is a special case of the more general fact that if we run Bernoulli trials at a faster and faster rate but with a smaller and smaller success probability, in just the right way, the Bernoulli trials process converges to the Poisson process. Specifically, suppose that we have a sequence of Bernoulli trials processes. In process \(n\) we perform the trials at a rate of \(n\) per unit time, with success probability \(p_n\). Our basic assumption is that \(n p_n \to r\) as \(n \to \infty\) where \(r \gt 0\). Now let \(Y_{n,t}\) denote the number of successes in the time interval \((0,t]\) for Bernoulli trials process \(n\), and let \(N_t\) denote the number of arrivals in this interval for the Poisson process with rate \( r \). Then \( Y_{n,t} \) has the binomial distribution with parameters \(\lfloor n t \rfloor\) and \(p_n\), and of course \( N_t \) has the Poisson distribution with parameter \( r t \).

For \(t \gt 0\), the distribution of \(Y_{n,t}\) converges to the distribution of \(N_t\) as \(n \to \infty\).

Details:

Note that \( n t - 1 \lt \lfloor n t \rfloor \le n t \) and hence \( n p_n t - p_n \lt \lfloor n t \rfloor p_n \le n p_n t \). Since \( n p_n \to r \) and \( p_n \to 0 \) as \( n \to \infty \), it follows from the squeeze theorem for limits that \( \lfloor n t \rfloor p_n \to r t \) as \( n \to \infty \). Thus, the result follows from .

Compare the Poisson experiment and the binomial timeline experiment.

Open the Poisson experiment and set \( r = 1 \) and \( t = 5 \). Run the experiment a few times and note the general behavior of the random points in time. Note also the shape and location of the probability density function and the mean\( \pm \)standard deviation bar.
Now open the binomial timeline experiment and set \( n = 100 \) and \( p = 0.05 \). Run the experiment a few times and note the general behavior of the random points in time. Note also the shape and location of the probability density function and the mean\( \pm \)standard deviation bar.

From a practical point of view, the convergence of the binomial distribution to the Poisson means that if the number of trials \(n\) is large and the probability of success \(p\) small, so that \(n p^2\) is small, then the binomial distribution with parameters \(n\) and \(p\) is well approximated by the Poisson distribution with parameter \(r = n p\). This is often a useful result, because the Poisson distribution has fewer parameters than the binomial distribution (and often in real problems, the parameters may only be known approximately). Specifically, in the approximating Poisson distribution, we do not need to know the number of trials \(n\) and the probability of success \(p\) individually, but only in the product \(n p\). The condition that \(n p^2\) be small means that the variance of the binomial distribution, namely \(n p (1 - p) = n p - n p^2\) is approximately \(r = n p\), the variance of the approximating Poisson distribution.

Recall that the binomial distribution can also be approximated by the normal distribution, by virtue of the central limit theorem. The normal approximation works well when \(n p\) and \(n (1 - p)\) are large; the rule of thumb is that both should be at least 5. The Poisson approximation works well, as we have already noted, when \(n\) is large and \( n p^2 \) small.

The Geometric Distribution

Suppose that \(T\) has the exponential distribution on \([0, \infty)\) with rate parameter \(\alpha \in (0, 1)\). Suppose also that \(\bs N = (N_t: t \in [0, \infty))\) is a Poisson counting process, independent of \(T\), and with rate \(\beta \in (0, \infty)\). Then \(N_T\) has the geometric distribution on \(\N\) with success parameter \(\alpha / (\alpha + \beta)\).

Details:

For and analytic proof, the main tool is conditioning on \(T\): \begin{align*} \P(N_T = k) & = \E[\P(N_T = k \mid T)] = \int_0^\infty \P(N_t = k) \alpha e^{-\alpha t} \, dt = \int_0^\infty e^{- \beta t} \frac{(\beta t)^k}{k!} \alpha e^{-\alpha t} \, dt \\ & = \alpha \frac{\beta ^k}{k!} \int_0^\infty t^k e^{-(\alpha + \beta) t} \, dt = \alpha \frac{\beta ^k}{k!} \frac{k!} {(\alpha + \beta)^{k + 1}} = \frac{\alpha}{\alpha + \beta} \left(\frac{\beta} {\alpha + \beta}\right)^k, \quad k \in \N \end{align*}

We can also give a more probabilistic argument based on the ordering and memoryless properties discussed in the section on the exponential distribution. Let \(\bs X = (X_1, X_2, \ldots)\) denote the sequence of interarrival times corresponding to the counting process \(\bs N\). So these are independent variables, each having the exponential distribution with parameter \(\beta\). The indicator variables \(\bs 1 (T \lt X_j)\) where \(j \in \N_+)\) forms a sequence of Bernoulli trials with success parameter \(\alpha / (\alpha + \beta)\). Using the extended memoryless property and independence, \begin{align*} \P(N_T \ge k) & = \P(T \gt X_1, T \gt X_1 + X_2, \ldots, T \gt X_1 + \cdots + X_k)\\ & = \P(T \gt X_1) \P(T \gt X_1 + X_2 \mid T \gt X_1) \cdots \P(T \gt X_1 + \cdots + X_k \mid T \gt X_1 + \cdots + X_{k - 1}) \\ & = \P(T \gt X_1) \P(T \gt X_2) \cdots \P(T \gt X_k) = \left(\frac{\beta}{\alpha + \beta}\right)^k, \quad k \in \N \end{align*} so \(N_T\) has the geometric distribution on \(\N\) with success parameter \(\alpha / (\alpha + \beta)\) and can be interpreted as the number of failures before the first success for the Bernoulli trials defined above.

Computational Exercises

Suppose that requests to a web server follow the Poisson model with rate \(r = 5\) per minute. Find the probability that there will be at least 8 requests in a 2 minute period.

Details:

0.7798

Defects in a certain type of wire follow the Poisson model with rate 1.5 per meter. Find the probability that there will be no more than 4 defects in a 2 meter piece of the wire.

Details:

0.8153

Suppose that customers arrive at a service station according to the Poisson model, at a rate of \(r = 4\). Find the mean and standard deviation of the number of customers in an 8 hour period.

Details:

32, 5.657

In the Poisson experiment, set \(r = 5\) and \(t = 4\). Run the experiment 1000 times and compute the following:

\(\P(15 \le N_4 \le 22) \)
The relative frequency of the event \(\{15 \le N_4 \le 22\}\).
The normal approximation to \(\P(15 \le N_4 \le 22)\).

Details:

0.6157
0.6025

Suppose that requests to a web server follow the Poisson model with rate \(r = 5\) per minute. Compute the normal approximation to the probability that there will be at least 280 requests in a 1 hour period.

Details:

0.8818

Suppose that requests to a web server follow the Poisson model, and that 1 request comes in a five minute period. Find the probability that the request came during the first 3 minutes of the period.

Details:

0.6

Suppose that requests to a web server follow the Poisson model, and that 10 requests come during a 5 minute period. Find the probability that at least 4 requests came during the first 3 minutes of the period.

Details:

0.9452

In the Poisson experiment, set \(r = 3\) and \(t = 5\). Run the experiment 100 times.

For each run, compute the estimate of \(r\) based on \(N_t\).
Over the 100 runs, compute the average of the squares of the errors.
Compare the result in (b) with \( \var(N_t) \).

Suppose that requests to a web server follow the Poisson model with unknown rate \(r\) per minute. In a one hour period, the server receives 342 requests. Estimate \(r\).

Details:

\(r = 5.7\) per minute

In the binomial experiment, set \(n = 30\) and \(p = 0.1\), and run the simulation 1000 times. Compute and compare each of the following:

\(\P(Y_{30} \le 4)\)
The relative frequency of the event \(\{Y_{30} \le 4\}\)
The Poisson approximation to \(\P(Y_{30} \le 4)\)

Details:

0.8245
0.8153

Suppose that we have 100 memory chips, each of which is defective with probability 0.05, independently of the others. Approximate the probability that there are at least 3 defectives in the batch.

Details:

0.7350

In the binomial timeline experiment, set \(n = 40\) and \(p = 0.1\) and run the simulation 1000 times. Compute and compare each of the following:

\(\P(Y_{40} \gt 5)\)
The relative frequency of the event \(\{Y_{40} \gt 5\}\)
The Poisson approximation to \(\P(Y_{40} \gt 5)\)
The normal approximation to \(\P(Y_{40} \gt 5)\)

Details:

0.2063
0.2149
0.2146

In the binomial timeline experiment, set \(n = 100\) and \(p = 0.1\) and run the simulation 1000 times. Compute and compare each of the following:

\(\P(8 \lt Y_{100} \lt 15)\)
The relative frequency of the event \(\{8 \lt Y_{100} \lt 15\}\)
The Poisson approximation to \(\P(8 \lt Y_{100} \lt 15)\)
The normal approximation to \(\P(8 \lt Y_{100} \lt 15)\)

Details:

0.6066
0.5837
0.6247

A text file contains 1000 words. Assume that each word, independently of the others, is misspelled with probability \(p\).

If \(p = 0.015\), approximate the probability that the file contains at least 20 misspelled words.
If \(p = 0.001\), approximate the probability that the file contains at least 3 misspelled words.

Details:

The true distribution of the number of misspelled words is binomial, with \( n = 1000 \) and \( p \).

The normal approximation (with \( \mu = n p = 15 \) and \( \sigma^2 = n p (1 - p) = 14.775 \)) is 0.120858. The Poisson approximation (with parameter \( n p = 15 \)) is 0.124781. The true binomial probability is 0.123095.
The Poisson approximation (with parameter \( n p = 1 \)) is 0.0803014. The true binomial probability is 0.0802093.

4. The Poisson Distribution

Basic Theory

Probability density function.

Moments

Estimating the Rate

Additional Properties

Increments and Characterizations

Sums

Normal Approximation

General Exponential

The Uniform Distribution

The Binomial Distribution

The Geometric Distribution

Computational Exercises