\(\renewcommand{\P}{\mathbb{P}}\) \(\newcommand{\R}{\mathbb{R}}\) \(\newcommand{\N}{\mathbb{N}}\) \(\newcommand{\Q}{\mathbb{Q}}\) \(\newcommand{\bs}{\boldsymbol}\)
  1. Random
  2. 1. Probability Spaces
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7
  10. 8
  11. 9
  12. 10

6. Convergence

In this section we discuss several topics that are a bit advanced, but very important. In particular the results obtained in this section will be essential for establishing

Some of the concepts from the section on Partial Orders in the chapter on Foundations are essential for this section. As usual, our starting point is a random experiment with probability space \( (S, \mathscr{S}, \P) \). Thus, \( S \) is the sample space, \( \mathscr{S} \) is the collection of events, and \( \P \) is the probability measure.

Basic Theory

Sequences of events

Our first discussion deals with sequences of events and various types of limits of such sequences. The limits are also event. We start with two simple definitions.

Suppose that \( (A_1, A_2, \ldots) \) is a sequence of events.

  1. The sequence is increasing if \( A_n \subseteq A_{n+1} \) for every \( n \in \N_+ \).
  2. The sequence is decreasing if \( A_{n+1} \subseteq A_n \) for every \( n \in \N_+ \).

Note that these are the standard definitions of increasing and decreasing, relative to the ordinary total order \( \le \) on the index set \( \N_+ \) and the subset partial order \( \subseteq \) on the collection of events. The terminology is also justified by the corresponding indicator variables.

Suppose that \( (A_1, A_2, \ldots \) is a sequence of events, and let \(I_n = \bs{1}_{A_n}\) denote the indicator variable of the event \(A_n\) for \(n \in \N_+\).

  1. The sequence of events is increasing if and only if the sequence of indicator variables is increasing in the ordinary sense. That is, \(I_n \le I_{n+1}\) for each \(n \in \N_+\).
  2. The sequence of events is decreasing if and only if the sequence of indicator variables is decreasing in the ordinary sense. That is, \(I_{n+1} \le I_n\) for each \(n \in \ _+\).
Proof:

Both parts follow from the fact that if \( A, \; B \subseteq S \) then \( A \subseteq B \) if and only if \( \bs{1}_A \le \bs{1}_B \). To see this note that \( A \subseteq B \) if and only if \( s \in A \) implies \( s \in B \) for \( s \in S \), if and only if \( \bs{1}_A(s) = 1 \) implies \( \bs{1}_B(s) = 1 \) for \( s \in S \). Since indicator functions only take the values 0 and 1, the last statement is equivalent to \( \bs{1}_A \le \bs{1}_B \).

A sequence of increasing events and their union
A sequence of increasing events
A sequence of decreasing events and their intersection
A sequence of decreasing events

If a sequence of events is either increasing or decreasing, we can define the limit of the sequence in a way that turns out to be quite natural.

Suppose that \( (A_1, A_2, \ldots) \) is a sequence of events.

  1. If the sequence is increasing, we define \( \lim_{n \to \infty} A_n = \bigcup_{n=1}^\infty A_n \).
  2. If the sequence is decreasing, we define \( \lim_{n \to \infty} A_n = \bigcap_{n=1}^\infty A_n \).

Once again, the terminology is clarified by the corresponding indicator variables.

Suppose again that \( (A_1, A_2, \ldots) \) is a sequence of events, and let \(I_n = \bs{1}_{A_n}\) denote the indicator variable of \(A_n\) for \(n \in \N_+\).

  1. If the sequence of events is increasing, then \( \lim_{n \to \infty} I_n \) is the indicator variable of \( \bigcup_{n = 1}^\infty A_n \)
  2. If the sequence of events is decreasing, then \( \lim_{n \to \infty} I_n \) is the indicator variable of \( \bigcap_{n = 1}^\infty A_n \)
Proof:
  1. If \( s \in \bigcup_{n=1}^\infty A_n\) then \( s \in A_k \) for some \( k \in \N_+ \). Since the events are increasing, \( s \in A_n \) for every \( n \ge k \). In this case, \( I_n(s) = 1 \) for every \( n \ge k \) and hence \( \lim_{n \to \infty} I_n(s) = 1 \). On the other hand, if \( s \notin \bigcup_{n=1}^\infty A_n \) then \( s \notin A_n \) for every \( n \in \N_+ \). In this case, \( I_n(s) = 0\) for every \( n \in \N_+ \) and hence \( \lim_{n \to \infty} I_n(s) = 0 \).
  2. If \( s \in \bigcap_{n=1}^\infty A_n \) then \( s \in A_n \) for each \( n \in \N_+ \). In this case, \( I_n(s) = 1 \) for each \( n \in \N_+ \) and hence \( \lim_{n \to \infty} I_n(s) = 1 \). If \( s \notin \bigcap_{n=1}^\infty A_n\) then \( s \notin A_k \) for some \( k \in \N_+ \). Since the events are decreasing, \( s \notin A_n \) for all \( n \ge k \). In this case, \( I_n(s) = 0 \) for \( n \ge k \) and hence \( \lim_{n \to \infty} I_n(s) = 0 \).

An arbitrary union of events can always be written as a union of increasing events, and an arbitrary intersection of events can always be written as an intersection of decreasing events:

Suppose that \((A_1, A_2, \ldots)\) is a sequence of events. Then

  1. \(\bigcup_{i = 1}^ n A_i\) is increasing in \(n \in \N_+\) and \(\bigcup_{i = 1}^\infty A_i = \lim_{n \to \infty} \bigcup_{i = 1}^n A_i\).
  2. \(\bigcap_{i=1}^n A_i\) is decreasing in \(n \in \N_+\) and \(\bigcap_{i=1}^\infty A_i = \lim_{n \to \infty} \bigcap_{i=1}^n A_i\).
Proof:
  1. Trivially \( \bigcup_{i=1}^n A_i \subseteq \bigcup_{i=1}^{n+1} A_i \). The second statement simply means that \( \bigcup_{n=1}^\infty \bigcup_{i = 1}^n A_i = \bigcup_{i=1}^\infty A_i\).
  2. Trivially \( \bigcap_{i=1}^{n+1} A_i \subseteq \bigcap_{i=1}^n A_i \). The second statement simply means that \( \bigcap_{n=1}^\infty \bigcap_{i=1}^n A_i = \bigcap_{i=1}^\infty A_i \).

There is a more interesting and useful way to generate increasing and decreasing sequences from an arbitrary sequence of events, using the tail segment of the sequence rather than the initial segment.

Suppose that \( (A_1, A_2, \ldots) \) is a sequence of events. Then

  1. \(\bigcup_{i=n}^\infty A_i\) is decreasing in \(n \in \N_+\).
  2. \(\bigcap_{i=n}^\infty A_i\) is increasing in \(n \in \N_+\).
Proof:
  1. Clearly \(\bigcup_{i=n+1}^\infty A_i \subseteq \bigcup_{i=n}^\infty A_i\)
  2. Clearly \(\bigcap_{i=n}^\infty A_i \subseteq \bigcap_{i=n+1}^\infty A_i\)

Since the new sequences defined in the previous results are decreasing and increasing, respectively, we can take their limits. These are the limit superior and limit inferior, respectively, of the original sequence.

Suppose that \( (A_1, A_2, \ldots) \) is a sequence of events. Define

  1. \( \limsup_{n \to \infty} A_n = \lim_{n \to \infty} \bigcup_{i=n}^\infty A_i = \bigcap_{n=1}^\infty \bigcup_{i=n}^\infty A_i \). This is the event that occurs if an only if \( A_n \) occurs for infinitely many values of \( n \).
  2. \( \liminf_{n \to \infty} A_n = \lim_{n \to \infty} \bigcap_{i=n}^\infty A_i = \bigcup_{n=1}^\infty \bigcap_{i=n}^\infty A_i \). This is the event that occurs if an only if \( A_n \) occurs for all but finitely many values of \( n \).
Proof:
  1. From the definition, the event \( \limsup_{n \to \infty} A_n \) occurs if and only if for each \( n \in \N_+ \) there exists \( i \ge n \) such that \( A_i \) occurs.
  2. From the definition, the event \( \liminf_{n \to \infty} A_n \) occurs if and only if there exists \( n \in \N_+ \) such that \( A_i \) occurs for every \( i \ge n \).

Once again, the terminology and notation are clarified by the corresponding indicator variables.

Let \(I_n\) denote the indicator variable of \(A_n\) for \(n \in \N_+\). Then

  1. \(\limsup_{n \to \infty} I_n \) is the indicator variable of \(\limsup_{n \to \infty} A_n\).
  2. \(\liminf_{n \to \infty} I_n \) is the indicator variable of \(\liminf_{n \to \infty} A_n\).
Proof:
  1. By the result above for for decreasing events, \( \lim_{n \to \infty} \bs{1}\left(\bigcup_{i=n}^\infty A_i\right) \) is the indicator variable of \( \limsup_{n \to \infty} A_n \). But \(\bs{1}\left(\bigcup_{i=n}^\infty A_i\right) = \max\{I_i: i \ge n\}\) and hence \( \lim_{n \to \infty} \bs{1}\left(\bigcup_{i=n}^\infty A_i\right) = \limsup_{n \to \infty} I_n \).
  2. By the result above for increasing events, \( \lim_{n \to \infty} \bs{1}\left(\bigcap_{i=n}^\infty A_i\right) \) is the indicator variable of \( \liminf_{n \to \infty} A_n \). But \(\bs{1}\left(\bigcap_{i=n}^\infty A_i\right) = \min\{I_i: i \ge n\}\) and hence \( \lim_{n \to \infty} \bs{1}\left(\bigcap_{i=n}^\infty A_i\right) = \liminf_{n \to \infty} I_n \).

Suppose that \( (A_1, A_2, \ldots) \) is a sequence of events. Then \(\liminf_{n \to \infty} A_n \subseteq \limsup_{n \to \infty} A_n\).

Proof:

If \( A_n \) occurs for all but finitely many \( n \in \N_+ \) then certainly \( A_n \) occurs for infinitely many \( n \in \N_+ \).

Suppose that \( (A_1, A_2, \ldots) \) is a sequence of events. Then

  1. \(\left( \limsup_{n \to \infty} A_n \right)^c = \liminf_{n \to \infty} A_n^c\)
  2. \(\left( \liminf_{n \to \infty} A_n \right)^c = \limsup_{n \to \infty} A_n^c\).
Proof:

These results follows from DeMorgan's laws.

The Continuity Theorems

Generally speaking, a function is continuous if it preserves limits. Thus, the following results are the continuity theorems of probability. Part (a) is the continuity theorem for increasing events and part (b) the continuity theorem for decreasing events.

Suppose that \(A_1, A_2, \ldots\) is a sequence of events.

  1. If the sequence is increasing then \(\lim_{n \to \infty} \P(A_n) = \P\left( \lim_{n \to \infty} A_n \right) = \P\left(\bigcup_{n=1}^\infty A_n\right)\)
  2. If the sequence is decreasing then \(\lim_{n \to \infty} \P(A_n) = \P\left( \lim_{n \to \infty} A_n \right) = \P\left(\bigcap_{n=1}^\infty A_n\right)\)
Proof:
  1. Let \(B_1 = A_1\) and let \(B_i = A_i \setminus A_{i-1}\) for \(i \in \{2, 3, \ldots\}\). Note that the collection of events \(\{B_1, B_2, \ldots \}\) is pairwise disjoint and has the same union as \(\{A_1, A_2, \ldots \}\). From the additivity axiom of probability and the definition of infinite series, \[ \P\left(\bigcup_{i=1}^\infty A_i\right) = \P\left(\bigcup_{i=1}^\infty B_i\right) = \sum_{i = 1}^\infty \P(B_i) = \lim_{n \to \infty} \sum_{i = 1}^n \P(B_i) \] But \( \P(B_1) = \P(A_1) \) and \( \P(B_i) = \P(A_i) - \P(A_{i-1}) \) for \( i \in \{2, 3, \ldots\} \). Therefore \( \sum_{i=1}^n \P(B_i) = \P(A_n) \) and hence we have \( \P\left(\bigcup_{i=1}^\infty A_i\right) = \lim_{n \to \infty} \P(A_n) \).
    The construction in the continuity theorem for increasing events
    The construction in the continuity theorem
  2. The sequence of complements \(\left(A_1^c, A_2^c, \ldots\right)\) is increasing. Hence using part (a), DeMorgan's law, and the complement rule we have \[ \P\left(\bigcap_{i=1}^\infty A_i \right) = 1 - \P\left(\bigcup_{i=1}^\infty A_i^c\right) = 1 - \lim_{n \to \infty} \P(A_n^c) = \lim_{n \to \infty} \left[1 - \P\left(A_n^c\right)\right] = \lim_{n \to \infty} \P(A_n) \]

The continuity theorems can be applied to the increasing and decreasing sequences that we constructed earlier from an arbitrary sequence of events.

Suppose that \((A_1, A_2, \ldots)\) is a sequence of events.

  1. \(\P\left( \bigcup_{i=1}^\infty A_i \right) = \lim_{n \to \infty} \P\left( \bigcup_{i = 1}^n A_i \right)\)
  2. \(\P\left( \bigcap_{i=1}^\infty A_i \right) = \lim_{n \to \infty} \P\left( \bigcap_{i = 1}^n A_i \right)\)
Proof:

These results follow immediately from the continuity theorems and the result above for unions and intersections.

Suppose that \( (A_1, A_2, \ldots) \) is a sequence of events. Then

  1. \(\P\left(\limsup_{n \to \infty} A_n\right) = \lim_{n \to \infty} \P\left(\bigcup_{i=n}^\infty A_i\right)\)
  2. \(\P\left(\liminf_{n \to \infty} A_n\right) = \lim_{n \to \infty} \P\left(\bigcap_{i=n}^\infty A_i\right)\)
Proof:

These results follows directly the result above, the definition of the limit superior and inferior, and the continuity theorems.

The next result shows that the countable additivity axiom for a probability measure is equivalent to finite additivity and the continuity property for increasing events.

Temporarily, suppose that \( \P \) is only finitely additive, but satisfies the continuity property for increasing events above. Then \( \P \) is countably additive.

Proof:

Suppose that \( (A_1, A_2, \ldots) \) is a sequence of pairwise disjoint events. Since we are assuming that \( \P \) is finitely additive we have

\[ \P\left(\bigcup_{i=1}^n A_i\right) = \sum_{i=1}^n \P(A_i) \]

If we let \( n \to \infty \), the left side converges to \( \P\left(\bigcup_{i=1}^\infty A_i\right) \) by the continuity assumption and the theorem above for unions, while the right side converges to \( \sum_{i=1}^\infty \P(A_i) \) by the definition of an infinite series.

There are a few mathematicians who reject the countable additivity axiom of probability measure in favor of the weaker finite additivity axiom. Whatever the philosophical arguments may be, life is certainly much harder without the continuity theorems.

The Borel-Cantelli Lemmas

The Borel-Cantelli Lemmas, named after Emil Borel and Francessco Cantelli, are very important tools in probability theory. The first Borel-Cantelli lemma gives a condition that is sufficient to conclude that infinitely many events occur with probability 0.

Suppose that \( (A_1, A_2, \ldots) \) is a sequence of events. If \(\sum_{n=1}^\infty \P(A_n) \lt \infty\) then \(\P\left(\limsup_{n \to \infty} A_n\right) = 0\).

Proof:

From the continuity result above, we have \( \P\left(\limsup_{n \to \infty} A_n\right) = \lim_{n \to \infty} \P\left(\bigcup_{i = n}^\infty A_i \right) \). But from Boole's inequality, \( \P\left(\bigcup_{i = n}^\infty A_i \right) \le \sum_{i = n}^\infty \P(A_i) \). Since \( \sum_{i = 1}^\infty \P(A_i) \lt \infty \), we have \( \sum_{i = n}^\infty \P(A_i) \to 0 \) as \( n \to \infty \).

The second Borel-Cantelli Lemma gives a condition that is sufficient to conclude that infinitely many independent events occur with probability 1.

Suppose that \((A_1, A_2, \ldots)\) is a sequence of independent events. If \(\sum_{n=1}^\infty \P(A_n) = \infty\) then \(\P\left( \limsup_{n \to \infty} A_n \right) = 1\).

Proof:

Note first that \(1 - x \le e^{-x}\) for every \(x \in \R\), and hcnce \( 1 - \P(A_i) \le \exp\left[-\P(A_i)\right] \) for each \( i \in \N_+ \). From the results above for limit inferior and complement, \[ \P\left[\left(\limsup_{n \to \infty} A_n\right)^c\right] = \P\left(\liminf_{n \to \infty} A_n^c\right) = \lim_{n \to \infty} \P \left(\bigcap_{i = n}^\infty A_i^c\right) \] But by independence and the inequality above, \[ \P\left(\bigcap_{i = n}^\infty A_i^c\right) = \prod_{i = n}^\infty \P\left(A_i^c\right) = \prod_{i = n}^\infty \left[1 - \P(A_i)\right] \le \prod_{i = n}^\infty \exp\left[-\P(A_i)\right] = \exp\left(-\sum_{i = n}^\infty \P(A_i) \right) = 0 \]

For independent events, both Borel-Cantelli lemmas apply of course, and lead to a zero-one law.

If \( (A_1, A_2, \ldots) \) is a sequence of independent events then \( \limsup_{n \to \infty} A_n \) has probability 0 or 1:

  1. If \(\sum_{n=1}^\infty \P(A_n) \lt \infty\) then \(\P\left( \limsup_{n \to \infty} A_n \right) = 0\).
  2. If \(\sum_{n=1}^\infty \P(A_n) = \infty\) then \(\P\left( \limsup_{n \to \infty} A_n \right) = 1\).

This result is actually a special case of a more general zero-one law, known as the Kolmogorov zero-one law, and named for Andrei Kolmogorov. This law is studied in the more advanced section on measure. Also, we can use the zero-one law to derive a calculus theorem that relates infinite series and infinte products. This derivation is an example of the probabilistic method—the use of probability to obtain results, seemingly unrelated to probability, in other areas of mathematics.

Suppose that \( p_i \in (0, 1) \) for each \( i \in \N_+ \). Then \[ \prod_{i=1}^\infty p_i \gt 0 \text{ if and only if } \sum_{i=1}^\infty (1 - p_i) \lt \infty \]

Proof:

We can easily construct a probability space with a sequence of independent events \( (A_1, A_2, \ldots) \) such that \( \P(A_i) = 1 - p_i \) for each \( i \in \N_+ \). The result then follows from the proofs of the two Borel-Cantelli lemmas.

Our next result is a simple application of the second Borel-Cantelli lemma to independent replications of a basic experiment.

Suppose that \(A\) is an event in a basic random experiment with \(\P(A) \gt 0\). In the compound experiment that consists of independent replications of the basic experiment, the event \(A\) occurs infinitely often has probability 1.

Proof:

Let \( p \) denote the probability of \( A \) in the basic experiment. In the compound experiment, we have a sequence of independent events \( (A_1, A_2, \ldots) \) with \( \P(A_n) = p \) for each \( n \in \N_+ \) (these are independent copies of \( A \)). But \( \sum_{n=1}^\infty \P(A_n) = \infty \) since \( p \gt 0 \) so the result follows from the second Borel-Cantelli lemma.

Convergence of Random Variables

Suppose that \((X_1, X_2, \ldots)\) and \(X\) are real-valued random variables for our experiment. We will discuss two ways that the sequence \(X_n\) can converge to \(X\) as \(n \to \infty\). These are fundamentally important concepts, since some of the deepest results in probability theory are limit theorems involving random variables.

We say that \(X_n \to X\) as \(n \to \infty\) with probability 1 if the event that \( X_n \to X \) as \( n \to \infty \) (in the usual calculus sense) has probability 1. That is, \[\P\{s \in S: X_n(s) \to X(s) \text{ as } n \to \infty\} = 1\]

But of course, as good probabilists, we usually suppress references to the sample space and write the definition simply as \( \P(X_n \to X \text{ as } n \to \infty) = 1 \). Part (c) of the theorem below shows that \( \{X_n \to X \text{ as } n \to \infty\} \) really is a valid event—that is, a member of the \( \sigma \)-algebra \( \mathscr{S} \). The statement that an event has probability 1 is usually the strongest affirmative statement that we can make in probability theory. Thus, convergence with probability 1 is the strongest form of convergence. The phrases almost surely and almost everywhere are sometimes used instead of the phrase with probability 1. Here is another way that a sequence of random variables can converge:

We say that \(X_n \to X\) as \(n \to \infty\) in probability if \[\P\left(\left|X_n - X\right| \gt \epsilon\right) \to 0 \text{ as } n \to \infty \text{ for each } \epsilon \gt 0\]

The phrase in probability sounds superficially like the phrase with probability 1. However, as we will see, convergence in probability is much weaker than convergence with probability 1. Indeed, convergence with probability 1 is often called strong convergence, while convergence in probability is often called weak convergence. The next sequence of results explores convergence with probability 1. We will let \(\Q_+\) denote the set of positive rational numbers; a critical point to remember is that this set is countable.

The following events are the same:

  1. \(X_n\) does not converge to \(X\) as \(n \to \infty\).
  2. For some \(\epsilon \gt 0\), \(\left|X_n - X\right| \gt \epsilon\) for infinitely many \(n \in \N_+\).
  3. For some \(\epsilon \in \Q_+\), \(\left|X_n - X\right| \gt \epsilon\) for infinitely many \(n \in \N_+\).
Proof:

The equality of events (a) and (b) is simply definition. The equality of events (b) and (c) follows because there are arbitrarily small positive rational numbers. Note that if the event \[ \left\{\left|X_n - X\right| \gt \epsilon \text{ for infinitely many } n \in \N_+ \right\} \] occurs for a given \( \epsilon \gt 0 \), the it holds for all smaller \( \epsilon \gt 0 \).

So as promised, the statement \( X_n \to X \) as \( n \to \infty \) is a valid event: \[ \left\{X_n \to X \text{ as } n \to \infty\right\}^c = \bigcup_{\epsilon \in \Q_+} \limsup_{n \to \infty} \left\{\left|X_n - X\right| \gt \epsilon\right\} \] Building a little at a time, note that \( \left\{\left|X_n - X\right| \gt \epsilon\right\} \) is an event for each \( \epsilon \gt 0 \) and \( n \in \N_+ \) since \( X_n \) and \( X \) are random variables. Next, the limit superior of a sequence of events is an event. Finally, a countable union of events is an event.

The following statements are equivalent:

  1. \(\P\left(X_n \to X \text{ as } n \to \infty\right) = 1\)
  2. \(\P\left(\left|X_n - X\right| \gt \epsilon \text{ for infinitely many } n \in \N_+\right) = 0 \) for every \(\epsilon \in \Q_+\).
  3. \(\P\left(\left|X_n - X\right| \gt \epsilon \text{ for infinitely many } n \in \N_+\right) = 0\) for every \(\epsilon \gt 0\).
  4. \(\P\left(\left|X_k - X\right| \gt \epsilon \text{ for some } k \ge n\right) \to 0\) as \(n \to \infty\) for every \(\epsilon \gt 0\).
Proof:

From part (c) of previous result, \( \P(X_n \to X \text{ as } n \to \infty) = 1 \) if and only if \[ \P\left(\bigcup_{\epsilon \in \Q_+} \left\{\left|X_n - X\right| \gt \epsilon \text{ for infinitely many } n \in \N_+\right\} \right) = 0 \] But by Boole's inequality, a countable union of events has probability 0 if and only if every event in the union has probability 0. Thus, (a) is equivalent to (b). Statement (b) is clearly equivalent to (c) since there are arbitrarily small positive rational numbers. Finally, (c) is equivalent to (d) by the continuity result above.

Our next result gives one of the fundamental criteria for convergence with probability 1:

If \(\sum_{n=1}^\infty \P\left(\left|X_n - X\right| \gt \epsilon\right) \lt \infty\) for every \(\epsilon \gt 0\) then \(X_n \to X\) as \(n \to \infty\) with probability 1.

Proof:

By the first Borel-Cantelli Lemma, if \(\sum_{n=1}^\infty \P\left(\left|X_n - X\right| \gt \epsilon\right) \lt \infty\) then \(\P\left(\left|X_n - X\right| \gt \epsilon \text{ for infinitely many } n \in \N_+\right) = 0\). Hence the result follows from the previous theorem.

We can now obtain one of our main results: convergence with probability 1 implies convergence in probability.

If \(X_n \to X\) as \(n \to \infty\) with probability 1 then \(X_n \to X\) as \(n \to \infty\) in probability.

Proof:

Let \( \epsilon \gt 0 \). Then \( \P\left(\left|X_n - X\right| \gt \epsilon\right) \le \P\left(\left|X_k - X\right| \gt \epsilon \text{ for some } k \ge n\right)\). But if \( X_n \to X \) as \( n \to \infty \) with probability 1, then the expression on the right converges to 0 as \( n \to \infty \) by part (d) of the result above. Hence \( X_n \to X \) as \( n \to \infty \) in probability.

The converse fails with a passion as the exercise below shows. However, there is a partial converse that is very useful.

If \(X_n \to X\) as \(n \to \infty\) in probability, then there exists a subsequence \((n_1, n_2, n_3 \ldots)\) of \(\N_+\) such that \(X_{n_k} \to X\) as \(k \to \infty\) with probability 1.

Proof:

Suppose that \( X_n \to X \) as \( n \to \infty \) in probability. Then for each \(k \in \N_+\) there exists \(n_k \in \N_+\) such that \(\P\left( \left| X_{n_k} - X \right| \gt 1 / k \right) \lt 1 / k^2\). We can make the choices so that \(n_k \lt n_{k+1}\) for each \(k\). It follows that \(\sum_{k=1}^\infty \P\left(\left|X_{n_k} - X\right| \gt \epsilon \right) \lt \infty\) for every \(\epsilon \gt 0\). By the criterion above, \(X_{n_k} \to X\) as \(n \to \infty\) with probability 1.

Note that the proof works because \(1 / k \to 0\) as \(k \to \infty\) and \(\sum_{k=1}^\infty 1 / k^2 \lt \infty\). Any two sequences with these properties would work just as well.

There are two other modes of convergence that we will discuss later:

Examples and Applications

Suppose that we have an infinite sequence of coins labeled \(1, 2, \ldots\) Moreover, coin \(n\) has probability of heads \(1 / n^a\) for each \(n \in \N_+\), where \(a \gt 0\) is a parameter. We toss each coin in sequence one time. In terms of \(a\), find the probability of the following events:

  1. infinitely many heads occur
  2. infinitely many tails occur
Answer:

Let \(H_n\) be the event that toss \(n\) results in heads, and \(T_n\) the event that toss \(n\) results in tails.

  1. \(\P\left(\limsup_{n \to \infty} H_n\right) = 1\), \(\P\left(\limsup_{n \to \infty} T_n\right) = 1\) if \(a \in (0, 1]\)
  2. \(\P\left(\limsup_{n \to \infty} H_n\right) = 0\), \(\P\left(\limsup_{n \to \infty} T_n\right) = 1\) if \(a \in (1, \infty)\)

The following exercise gives a simple example of a sequence of random variables that converge in probability but not with probability 1.

Suppose again that we have a sequence of coins labeled \(1, 2, \ldots\), and that coin \(n\) lands heads up with probability \(\frac{1}{n}\) for each \(n\). We toss the coins in order to produce a sequence \((X_1, X_2, \ldots)\) of independent indicator random variables with \[\P(X_n = 1) = \frac{1}{n}, \; \P(X_n = 0) = 1 - \frac{1}{n}; \quad n \in \N_+\]

  1. \(\P(X_n = 0 \text{ for infinitely many } n) = 1\), so that infinitely many tails occur with probability 1.
  2. \(\P(X_n = 1 \text{ for infinitely many } n) = 1\), so that infinitely many heads occur with probability 1.
  3. \(\P(X_n \text{ does not converge as } n \to \infty) = 1\).
  4. \(X_n \to 0\) as \(n \to \infty\) in probability.
Proof:
  1. This follow from the second Borel-Cantelli lemma, since \( \sum_{n = 1}^\infty \P(X_n = 0) = \infty \)
  2. This also follows from the second Borel-Cantelli lemma, since \( \sum_{n = 1}^\infty \P(X_n = 1) = \infty \).
  3. This follows from parts (a) and (b).
  4. Suppose \( 0 \lt \epsilon \lt 1 \). Then \( \P\left(\left|X_n - 0\right| \gt \epsilon\right) = \P(X_n = 1) = \frac{1}{n} \to 0 \) as \( n \to \infty \).