\(\newcommand{\var}{\text{var}}\)
\(\newcommand{\sd}{\text{sd}}\)
\(\newcommand{\cov}{\text{cov}}\)
\(\newcommand{\cor}{\text{cor}}\)
\(\renewcommand{\P}{\mathbb{P}}\)
\(\newcommand{\E}{\mathbb{E}}\)
\(\newcommand{\R}{\mathbb{R}}\)
\(\newcommand{\N}{\mathbb{N}}\)
\(\newcommand{\bs}{\boldsymbol}\)

Two of the most important modes of convergence in probability theory are convergence with probability 1 and convergence in mean. As we have noted several times, neither mode of convergence implies the other. However, if we impose an additional condition on the sequence of variables, convergence with probability 1 will imply convergence in mean. The purpose of this brief, but advanced section, is to explore the additional condition that is needed. This section is particularly important for the theory of martingales.

As usual, our starting point is a random experiment modeled by a probability space \( (\Omega, \mathscr{F}, \P) \). So \( \Omega \) is the sample space, \( \mathscr{F} \) is the \( \sigma \)-algebra of events, and \( \P \) is the probability measure. Recall from the previous section that for \( k \in [1, \infty) \), \( \mathscr{L}_k \) is the vector space of real-valued random variables \( X \) with \( \E(|X|^k) \lt \infty \), endowed with the norm \( \|X\|_k = \left[\E(X^k)\right]^{1/k} \). In particular, \( X \in \mathscr{L}_1 \) simply means that \( \E(|X|) \lt \infty \) so that \( \E(X) \) exists as a real number. From the section on expected value as an integral, recall the following notation, assuming of course that the expected value makes sense: \[ \E(X; A) = \E(X \bs{1}_A) = \int_A X \, d\P \]

The following result is motivation for the main definition in this section.

If \( X \) is a real-valued random variable then \( \E(|X|) \lt \infty \) if and only if \( \E(|X|; |X| \ge x) \to 0 \) as \( x \to \infty \).

Note that that \( |X| \bs{1}(|X| \le x) \) is nonnegative, increasing in \( x \in [0, \infty) \) and \( |X| \bs{1}(|X| \le x) \to |X| \) as \( x \to \infty \). From the monotone convergence theorem, \( \E(|X|; |X| \le x) \to \E(|X|) \) as \( x \to \infty \). On the other hand, \[ \E(|X|) = \E(|X|; |X| \le x) + \E(|X|; |X| \gt x) \] If \( \E(|X|) \lt \infty \) then taking limits in the displayed equation shows that \( \E(|X|: |X| \gt x) \to 0 \) as \( x \to \infty \). On the other hand, \( \E(|X|; |X| \le x) \le x \). So if \( \E(|X|) = \infty \) then \( \E(|X|; |X| \gt x) = \infty \) for every \( x \in [0, \infty) \).

Suppose now that \( X_i \) is a real-valued random variable for each \( i \) in a nonempty index set \( I \) (not necessarily countable). The critical definition for this section is to require the convergence in the previous theorem to hold *uniformly* for the collection of random variables \( \bs X = \{X_i: i \in I\} \).

The collection \( \bs X = \{X_i: i \in I\} \) is uniformly integrable if for each \( \epsilon \gt 0 \) there exists \( x \gt 0 \) such that for all \( i \in I \), \[ \E(|X_i|; |X_i| \gt x) \lt \epsilon \] Equivalently \( \E(|X_i|; |X_i| \gt x) \to 0 \) as \( x \to \infty \) uniformly in \( i \in I \).

Our next discussion centers on conditions that ensure that the collection \( \bs X = \{X_i: i \in I\} \) of real-valued random variables is uniformly integrable. Here is an equivalent characterization:

The collection \( \bs X = \{X_i: i \in I\} \) is uniformly integrable if and only if the following conditions hold:

- \( \{\E(|X_i|): i \in I\} \) is bounded.
- For each \( \epsilon \gt 0 \) there exists \( \delta \gt 0 \) such that if \( A \in \mathscr{F} \) and \( \P(A) \lt \delta \) then \( \E(|X_i|; A) \lt \epsilon \) for all \( i \in I \).

Suppose that \( \bs X \) is uniformly integrable. With \( \epsilon = 1 \) there exists \( x \gt 0 \) such that \( \E(|X_i|; |X_i| \gt x) \lt 1 \) for all \( i \in I \). Hence \[ \E(|X_i|) = \E(|X_i|; |X_i| \le x) + \E(|X_i|; |X_i| \gt x) \le x + 1, \quad i \in I \] so (a) holds. For (b), let \( \epsilon \gt 0 \). There exists \( x \gt 0 \) such that \( \E(|X_i|; |X_i| \gt x) \lt \epsilon / 2 \) for all \( i \in I \). Let \( \delta = \epsilon / 2 x \). If \( A \in \mathscr{F} \) and \( \P(A) \lt \delta \) then \[ \E(|X_i|; A) = \E(|X_i|; A \cap \{|X| \le x\}) + \E(|X_i|; A \cap \{|X| \gt x\}) \le x \P(A) + \E(|X_i|; |X| \gt x) \lt \epsilon / 2 + \epsilon / 2 = \epsilon\] Conversely, suppose that (a) and (b) hold. By (a), there exists \( c \gt 0 \) such that \( \E(|X_i|) \le c \) for all \( i \in I \). Let \( \epsilon \gt 0 \). By (b) there exists \( \delta \gt 0 \) such that if \( A \in \mathscr{F} \) with \( \P(A) \lt \delta \) then \( \E(|X_i|; A) \lt \epsilon \) for all \( i \in I \). Next, by Markov's inequality, \[ \P(|X_i| \gt x) \le \frac{\E(|X_i|)}{x} \le \frac{c}{x}, \quad i \in I \] Pick \( x \gt 0 \) such that \( c / x \lt \delta \), so that \(\P(|X_i| \gt x) \lt \delta\) for each \( i \in I \). Then for each \( j \in I \), \( \E(|X_i|; |X_j| \gt x) \lt \epsilon \) for all \( i \in I \) and so in particular, \( \E(|X_i|; |X_i| \gt x) \lt \epsilon \) for all \( i \in I \). Hence \( \bs X \) is uniformly integrable.

Condition (a) means that \( \bs X \) is bounded (in norm) as a subset of the vector space \( \mathscr{L}_1 \). Trivially, a *finite* collection of integrable random variables is uniformly integrable.

Suppose that \( I \) is finite and that \( \E(|X_i|) \lt \infty \) for each \( i \in I \). Then \( \bs X = \{X_i: i \in I\} \) is uniformly integrable.

A subset of a uniformly integrable set of variables is also uniformly integrable.

If \( \{X_i: i \in I\} \) is uniformly integrable and \( J \) is a nonempty subset of \( I \), then \( \{X_j: j \in J\} \) is uniformly integrable.

If the random variables in the collection are dominated in absolute value by a random variable with finite mean, then the collection is uniformly integrable.

Suppose that \( Y \) is a nonnegative random variable with \( \E(Y) \lt \infty \) and that \( |X_i| \le Y \) for each \( i \in I \). Then \( \bs X = \{X_i: i \in I\} \) is uniformly integrable.

Clearly \( \E(|X_i|; |X_i| \gt x) \le E(Y; Y \gt x) \) for \( x \in [0, \infty) \) and for all \( i \in I \). The right side is independent of \( i \in I \), and by our first result above, converges to 0 as \( x \to \infty \). Hence \( \bs X \) is uniformly integrable.

The following result is more general, but essentially the same proof works.

Suppose that \( \bs Y = \{X_j: j \in J\} \) is uniformly integrable, and \( \bs X = \{X_i: i \in I\} \) is a set of real-valued variables with the property that for each \( i \in I \) there exists \( j \in J \) such that \( |X_i| \le |Y_j| \). Then \( \bs X \) is uniformly integrable.

As a simple corollary, if the variables are bounded in absolute value then the collection is uniformly integrable.

If there exists \( c \gt 0 \) such that \( |X_i| \le c \) for all \( i \in I \) then \( \bs X = \{X_i: i \in I\} \) is uniformly integrable.

Just having \( \E(|X_i|) \) bounded in \( i \in I \) (condition (a) in the theorem above) is not sufficient for \( \bs X = \{X_i: i \in I\} \) to be uniformly integrable; a counterexample is given below. However, if \( \E(|X_i|^k) \) is bounded in \( i \in I \) for some \( k \gt 1 \), then \( \bs X \) *is* uniformly integrable. This condition means that \( \bs X \) is bounded (in norm) as a subset of the vector space \( \mathscr{L}_k \).

If \( \left\{\E(|X_i|^k: i \in I\right\} \) is bounded for some \( k \gt 1 \), then \( \{X_i: i \in I\} \) is uniformly integrable.

Suppose that for some \( k \gt 1 \) and \( c \gt 0 \), \( \E(|X_i|^k) \le c \) for all \( i \in I \). Then \( k - 1 \gt 0 \) and so \( t \mapsto t^{k-1} \) is increasing on \( (0, \infty) \). So if \( |X_i| \gt x \) for \( x \gt 0 \) then \[ |X_i|^k = |X_i| |X_i|^{k-1} \ge |X_i| x^{k-1} \] Hence \( |X_i| \le |X_i|^k / x^{k-1} \) on the event \( |X_i| \gt x \). Therefore \[ \E(|X_i|; |X_i| \gt x) \le \E\left(\frac{|X_i|^k}{x^{k-1}}; |X_i| \gt x\right) \le \frac{\E(|X_i|^k)}{x^{k-1}} \le \frac{c}{x^{k-1}} \] The last expression is independent of \( i \in I \) and converges to 0 as \( x \to \infty \). Hence \( \bs X \) is uniformly integrable.

Uniformly integrability is closed under the operations of addition and scalar multiplication.

Suppose that \( \bs X = \{X_i: i \in I\} \) and \( \bs Y = \{Y_i: i \in I\} \) are uniformly integrable and that \( c \in \R \). Then each of the following collections is also uniformly integrable.

- \( \bs X + \bs Y = \{X_i + Y_i: i \in I\} \)
- \( c \bs X = \{c X_i: i \in I\} \)

We use the characterization in above. The proofs use standard techniques, so try them yourself.

- There exists \( a, \, b \in (0, \infty) \) such that \( \E(|X_i|) \le a \) and \( \E(|Y_i|) \le b \) for all \( i \in I \). Hence \[ \E(|X_i + Y_i|) \le \E(|X_i| + |Y_i|) \le \E(|X_i|) + \E(|Y_i|) \le a + b, \quad i \in I \] Next let \( \epsilon \gt 0 \). There exists \( \delta_1 \gt 0 \) such that if \( A \in \mathscr{F} \) with \( \P(A) \lt \delta_1 \) then \( \E(|X_i|; A) \lt \epsilon / 2 \) for all \( i \in I \), and similarly, there exists \( \delta_2 \gt 0 \) such that if \( A \in \mathscr{F} \) with \( \P(A) \lt \delta_2 \) then \( \E(|Y_i|; A) \lt \epsilon / 2 \) for all \( i \in I \). Hence if \( A \in \mathscr{F} \) with \( \P(A) \lt \delta_1 \wedge \delta_2 \) then \[ \E(|X_i + Y_i|; A) \le \E(|X_i| + |Y_i|; A) = \E(|X_i|; A) + \E(|Y_i|; A) \lt \epsilon / 2 + \epsilon / 2 = \epsilon, \quad i \in I \]
- There exists \( a \in (0, \infty) \) such that \( \E(|X_i|) \le a \) for all \( i \in I \). Hence \[ \E(|c X_i|) = |c| \E(|X_i|) \le c a, \quad i \in I \] The second condition is trivial if \( c = 0 \), so suppose \( c \ne 0 \). For \( \epsilon \gt 0 \) there exists \( \delta \gt 0 \) such that if \( A \in \mathscr{F} \) and \( \P(A) \lt \delta \) then \( \E(|X_i|; A) \lt \epsilon / c \) for all \( i \in I \). Hence \( \E(|c X_i|; A) = |c| \E(|X_i|; A) \lt \epsilon \).

The following corollary is trivial, but will be needed in our discussion of convergence below.

Suppose that \( \{X_i: i \in I\} \) is uniformly integrable and that \( X \) is a real-valued random variable with \( \E(|X|) \lt \infty \). Then \( \{X_i - X: i \in I\} \) is uniformly integrable.

Let \( Y_i = X \) for each \( i \in I \). Then \( \{Y_i: i \in I\} \) is uniformly integrable, so the result follows from the previous theorem.

We now come to the main results, and the reason for the definition of uniform integrability in the first place. To set up the notation, suppose that \( X_n \) is a real-valued random variable for \( n \in \N_+ \) and that \( X \) is a real-valued random variable. We know that if \( X_n \to X \) as \( n \to \infty \) in mean then \( X_n \to X \) as \( n \to \infty \) in probability. The converse is also true if and only if the sequence is uniformly integrable. Here is the first half:

If \( X_n \to X \) as \( n \to \infty \) in mean, then \( \{X_n: n \in \N\} \) is uniformly integrable.

The hypothesis means that \( X_n \to X \) as \( n \to \infty \) in the vector space \( \mathscr{L}_1 \). That is, \( \E(|X_n|) \lt \infty \) for \( n \in \N_+ \), \( \E(|X|) \lt \infty \), and \( E(|X_n - X|) \to 0 \) as \( n \to \infty \). From the last section, we know that this implies that \( \E(|X_n|) \to \E(|X|) \) as \( n \to \infty \), so \( \E(|X_n|) \) is bounded in \( n \in \N \). Let \( \epsilon \gt 0 \). Then there exists \( N \in \N_+ \) such that if \( n \gt N \) then \( \E(|X_n - X|) \lt \epsilon/2 \). Since all of our variables are in \( \mathscr{L}_1 \), for each \( n \in \N_+ \) there exists \( \delta_n \gt 0 \) such that if \( A \in \mathscr{F} \) and \( \P(A) \lt \delta_n \) then \( \E(|X_n - X|; A) \lt \epsilon / 2 \). Similarly, there exists \( \delta_0 \gt 0 \) such that if \( A \in \mathscr{F} \) and \( \P(A) \lt \delta_0 \) then \( \E(|X|; A) \lt \epsilon / 2 \). Let \( \delta = \min\{\delta_n: n \in \{0, 1, \ldots, N\}\} \) so \( \delta \gt 0 \). If \( A \in \mathscr{F} \) and \( \P(A) \lt \delta \) then \[\E(|X_n|; A) = \E(|X_n - X + X|; A) \le \E(|X_n - X|; A) + \E(|X|; A), \quad n \in \N_+\] If \( n \le N \) then \( \E(|X_n - X|; A) \le \epsilon / 2 \) since \( \delta \le \delta_n \). If \( n \gt N \) then \( \E(|X_n - X|; A) \le \E(|X_n - X|) \lt \epsilon / 2 \). For all \( n \), \( E(|X|; A) \lt \epsilon / 2 \) since \( \delta \le \delta_0 \). So for all \( n \in \N_+ \), \( \E(|X_n|: A) \lt \epsilon \) and hence \( \{X_n: n \in \N_+\} \) is uniformly integrable.

Here is the more important half, known as the uniform integrability theorem:

If \( \{X_n: n \in \N_+\} \) is uniformly integrable and \( X_n \to X \) as \( n \to \infty \) in probability, then \( X_n \to X \) as \( n \to \infty \) in mean.

Since \( X_n \to X \) as \( n \to \infty \) in probability, we know that there exists a subsequence \( \left(X_{n_k}: k \in \N_+\right) \) of \( (X_n: n \in \N_+) \) such that \( X_{n_k} \to X \) as \( k \to \infty \) with probability 1. By the uniform integrability, \( \E(|X_n|) \) is bounded in \( n \in \N_+ \). Hence by Fatou's lemma \[ \E(|X|) = \E\left(\liminf_{k \to \infty} \left|X_{n_k}\right|\right) \le \liminf_{n \to \infty} \E\left(\left|X_{n_k}\right|\right) \le \limsup_{n \to \infty} \E\left(\left|X_{n_k}\right|\right) \lt \infty \] Let \( Y_n = X_n - X \) for \( n \in \N_+ \). From our result above, we know that \( \{Y_n: n \in \N_+\} \) is uniformly integrable, and we also know that \( Y_n \) converges to 0 as \( n \to \infty \) in probability. Hence we need to show that \( Y_n \to 0 \) as \( n \to \infty \) in mean. Let \( \epsilon \gt 0 \). By uniform integrability, there exists \( \delta \gt 0 \) such that if \( A \in \mathscr{F} \) and \( \P(A) \lt \delta \) then \( \E(|Y_n|: A) \lt \epsilon / 2 \) for all \( n \in \N \). Since \( Y_n \to 0 \) as \( n \to \infty \) in probability, there exists \( N \in \N_+ \) such that if \( n \gt N \) then \( \P(|Y_n| \gt \epsilon / 2) \lt \delta \). Hence if \( n \gt N \) then \[ \E(|Y_n|) = \E(|Y_n|; |Y_n| \le \epsilon / 2) + \E(|Y_n|; |Y_n| \gt \epsilon / 2) \lt \epsilon / 2 + \epsilon / 2 = \epsilon \] Hence \( Y_n \to 0 \) as \( n \to \infty \) in mean.

As a corollary, recall that if \( X_n \to X \) as \( n \to \infty \) with probability 1, then \( X_n \to X \) as \( n \to \infty \) in probability. Hence if \( \bs X = \{X_n: n \in \N_+\} \) is uniformly integrable then \( X_n \to X \) as \( n \to \infty \) in mean.

Our first example shows that bounded \( \mathscr{L}_1 \) norm is not sufficient for uniform integrability.

Suppose that \( U \) is uniformly distributed on the interval \( (0, 1) \) (so \( U \) has the standard uniform distribution). For \( n \in \N_+ \), let \( X_n = n \bs{1}(U \le 1 / n) \). Then

- \( \E(|X_n|) = 1 \) for all \( n \in \N_+ \)
- \( \E(|X_n|; |X_n| \gt x) = 1 \) for \( x \gt 0 \), \( n \in \N_+ \) with \( n \gt x \)

First note that \( |X_n| = X_n \) since \( X_n \ge 0 \).

- By definition, \( \E(X_n) = n \P(U \le 1 / n) = n / n = 1 \) for \( n \in \N_+ \).
- If \( n \gt x \gt 0 \) then \( X_n \gt x \) if and only if \( X_n = n \) if and only if \( U \le 1/n \). Hence \( \E(X_n; X_n \gt x) = n \P(U \le 1/n) = 1 \) as before.

By part (b), \( \E(|X_n|; |X_n| \gt x) \) does not converge to 0 as \( x \to \infty \) uniformly in \( n \in \N_+ \), so \( \bs X = \{X_n: n \in \N_+\} \) is not uniformly integrable.

The next example gives an important application to conditional expected value. Recall that if \( X \) is a real-valued random variable with \( \E(|X|) \lt \infty \) and \( \mathscr{G} \) is a sub \( \sigma \)-algebra of \( \mathscr{F} \) then \( \E(X \mid \mathscr{G}) \) is the expected value of \( X \) given the information in \( \mathscr{G} \), and is the \( \mathscr{G} \)-measurable random variable closest to \( X \) in a sense. Indeed if \( X \in \mathscr{L}_2(\mathscr{F}) \) then \( \E(X \mid \mathscr{G}) \) is the projection of \( X \) onto \( \mathscr{L}_2(\mathscr{G}) \). The collection of all conditional expected values of \( X \) is uniformly integrable:

Suppose that \( X \) is a real-valued random variable with \( \E(|X|) \lt \infty \). Then \( \{\E(X \mid \mathscr{G}): \mathscr{G} \text{ is a sub }\sigma\text{-algebra of } \mathscr{F}\}\) is uniformly integrable.

We use the characterization above. Let \( \mathscr{G} \) be a sub \( \sigma \)-algebra of \( \mathscr{F} \). Recall that \( \left|\E(X \mid \mathscr{G})\right| \le \E(|X| \mid \mathscr{G})\) and hence \[ \E[|\E(X \mid \mathscr{G})|] \le \E[\E(|X| \mid \mathscr{G})] = \E(|X|) \] So property (a) holds. Next let \( \epsilon \gt 0 \). Since \( \E(|X|) \lt \infty \), there exists \( \delta \gt 0 \) such that if \( A \in \mathscr{F} \) and \( \P(A) \lt \delta \) then \( \E(|X|; A) \lt \epsilon \). Suppose that \( A \in \mathscr{G} \) with \( \P(A) \lt \delta \). Then \(|\E(X \mid \mathscr{G})| \bs{1}_A \le \E(|X| \mid \mathscr{G}) \bs{1}_A\) so \[ \E[|\E(X \mid \mathscr{G})|; A] \le \E[\E(|X| \mid \mathscr{G}); A] = \E[\E(|X| \bs{1}_A \mid \mathscr{G}] = \E(|X|; A) \lt \epsilon \] So condition (b) holds. Note that the first equality in the displayed equation holds since \( A \in \mathscr{G} \).

Note that the collection of sub \( \sigma \)-algebras of \( \mathscr{F} \), and so also the collection of conditional expected values above, might well be uncountable. The conditional expected values range from \( \E(X) \), when \( \mathscr{G} = \{\Omega, \emptyset\} \) to \( X \) itself, when \( \mathscr{G} = \mathscr{F} \).