\( \renewcommand{\P}{\mathbb{P}} \) \( \newcommand{\E}{\mathbb{E}} \) \( \newcommand{\R}{\mathbb{R}} \) \( \newcommand{\Q}{\mathbb{Q}} \) \( \newcommand{\N}{\mathbb{N}} \) \( \newcommand{\bs}{\boldsymbol} \) \( \newcommand{\range}{\text{range}} \)
  1. Random
  2. 2. Distributions
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7
  10. 8
  11. 9
  12. 10
  13. 11
  14. 12
  15. 13
  16. 14

10. Properties of the Integral

Basic Theory

Again our starting point is a measure space \( (S, \mathscr{S}, \mu) \). That is, \( S \) is a set, \( \mathscr{S} \) is a \( \sigma \)-algebra of subsets of \( S \), and \( \mu \) is a positive measure on \( \mathscr{S} \).

Definition

In the last section we defined the integral of certain measurable functions \( f: S \to \R \) with respect to the measure \( \mu \). Recall that the integral, denoted \( \int_S f \, d\mu \), may exist as a number in \( \R \) (in which case \( f \) is integrable), or may exist as \( \infty \) or \( -\infty \), or may fail to exist. Here is a review of how the definition is built up in stages:

Definition of the integral

  1. If \( f \) is a nonnegative simple function, so that \( f = \sum_{i \in I} a_i \bs{1}_{A_i} \) where \( I \) is a finite index set, \( a_i \in [0, \infty) \) for \( i \in I \), and \( \{A_i: i \in I\} \) is measurable partition of \( S \), then \[ \int_S f \, d\mu = \sum_{i \in I} a_i \mu(A_i) \]
  2. If \( f: S \to [0, \infty) \) is measurable, then \[ \int_S f \, d\mu = \sup\left\{\int_S g \, d\mu: g \text{ is simple and } 0 \le g \le f\right\} \]
  3. If \( f: S \to \R \) is measurable, then \[ \int_S f \, d\mu = \int_S f^+ \, d\mu - \int_S f^- \, d\mu \] as long as the right side is not of the form \( \infty - \infty \), and where \( f^+ \) and \( f^- \) denote the positive and negative parts of \( f \).
  4. If \( f:S \to \R \) is measurable and \( A \in \mathscr{S} \), then the integral of \( f \) over \( A \) is defined by \[ \int_A f \, d\mu = \int_S \bs{1}_A f \, d\mu \] assuming that the integral on the right exists.

Consider a statement on the elements of \(S \), for example an equation or an inequality with \( x \in S \) as a free variable. (Technically such a statement is a predicate on \( S \).) For \( A \in \mathscr{S} \), we say that the statement holds on \( A \) if it is true for every \( x \in A \). We say that the statement holds almost everywhere on \( A \) (with respect to \( \mu \)) if there exists \( B \in \mathscr{S} \) with \( B \subseteq A \) such that the statement holds on \( B \) and \( \mu(A \setminus B) = 0 \).

Basic Properties

A few properties of the integral that were essential to the motivation of the definition were given in the last section. In this section, we extend some of those properties and we study a number of new ones. As a review, here is what we know so far.

Properties of the integral

  1. If \( f, \, g: S \to \R \) are measurable functions whose integrals exist, then \( \int_S (f + g) \, d\mu = \int_S f \, d\mu + \int_S g \, d\mu \) as long as the right side is not of the form \( \infty - \infty \).
  2. If \( f: S \to \R \) is a measurable function whose integral exists and \( c \in \R \), then \( \int_S c f \, d\mu = c \int_S f \, d\mu \).
  3. If \( f: S \to \R \) is measurable and \( f \ge 0 \) on \( S \) then \( \int_S f \, d\mu \ge 0 \).
  4. If \( f, \, g: S \to \R \) are measurable functions whose integrals exist and \( f \le g \) on \( S \) then \( \int_S f \, d\mu \le \int_S g \, d\mu \)
  5. If \( f_n: S \to [0, \infty) \) is measurable for \( n \in \N_+ \) and \( f_n \) is increasing in \( n \) on \( S \) then \(\int_S \lim_{n \to \infty} f_n \, d\mu = \lim_{n \to \infty} \int_S f_n \, d\mu \).
  6. \( f: S \to \R \) is measurable and the the integral of \( f \) on \( A \cup B \) exists, where \( A, \; B \in \mathscr{S} \) are disjoint, then \( \int_{A \cup B} f \, d\mu = \int_A f \, d\mu + \int_B f \, d\mu \).

Parts (a) and (b) are the linearity properties; part (a) is the additivity property and part (b) is the scaling property. Parts (c) and (d) are the order properties; part (c) is the positive property and part (d) is the increasing property. Part (e) is a continuity property known as the monotone convergence theorem. Part (f) is the additive property for disjoint domains. Properties (a)–(e) hold with \( S \) replaced by \( A \in \mathscr{S} \).

Equality and Order

Our first new results are extensions dealing with equality and order. The integral of a function over a null set is 0:

Suppose that \( f: S \to \R \) is measurable and \( A \in \mathscr{S} \) with \( \mu(A) = 0 \). Then \( \int_A f \, d\mu = 0 \).

Proof:

The proof proceeds in stages via the definition of the integral.

  1. Suppose that \( g \) is a nonnegative simple function with \( g = 0 \) on \( A^c \). Then \( g \) has the representation \( g = \sum_{i \in I} a_i \bs{1}_{A_i} \) where \( a_i \in (0, \infty) \) and \( A_i \subseteq A \) for for \( i \in I \). But \( \mu(A_i) = 0 \) for each \( i \in I\) and so \( \int_S g \, d\mu = \sum_{i \in I} a_i \mu(A_i) = 0 \)
  2. Suppose that \( f: S \to [0, \infty) \) is measurable. If \( g \) is a nonnegative simple function with \( g \le \bs{1}_A f \), then \( g = 0 \) on \( A^c \) so by (a), \( \int_S g \, d\mu = 0 \). Hence by definition, \( \int_A f \, d\mu = \int_S \bs{1}_A f \, d\mu = 0 \).
  3. Finally, suppose that \( f: S \to \R \) is measurable. Then \( \int_A f \, d\mu = \int_A f^+ \, d\mu - \int_A f^- \, d\mu \). But both integrals on the right are 0 by part (b).

Two functions that are indistinguishable from the point of view of \( \mu \) must have the same integral.

Suppose that \( f: S \to \R \) is a measurable function whose integral exists. If \( g: S \to \R \) is measurable and \( g = f \) almost everywhere on \( S \), then \( \int_S g \, d\mu = \int_S f \, d\mu \).

Proof:

Note that \( g = f \) if and only if \( g^+ = f^+ \) and \( g^- = f^- \). Let \( A = \{x \in S: g^+(x) = f^+(x)\} \). Then \( A \in \mathscr{S} \) and \( \mu(A^c) = 0 \). Hence by the additivity property and the previous result, \[ \int_S g^+ \, d\mu = \int_A g^+ \, d\mu + \int_{A^c} g^+ \, d\mu = \int_A f^+ \, d\mu + 0 = \int_A f^+ \, d\mu + \int_{A^c} f^+ \, d\mu = \int_S f^+ \, d\mu \] Similarly \( \int_S g^- \, d\mu = \int_S f^- \, d\mu\). Hence the integral of \( g \) exists and \( \int_S g \, d\mu = \int_S f \, d\mu \)

Next we have a simple extension of the positive property.

Suppose that \( f: S \to \R \) is measurable and \( f \ge 0 \) almost everywhere on \( S \). Then

  1. \( \int_S f \, d\mu \ge 0 \)
  2. \( \int_S f \, = 0 \) if and only if \( f = 0 \) almost everywhere on \( S \).
Proof:
  1. Let \( A = \{x \in S: f(x) \ge 0\} \). Then \( A \in \mathscr{S} \) and \( \mu(A^c) = 0 \). By the additivity of the integral over disjoint sets we have \[ \int_S f \, d\mu = \int_A f \, d\mu + \int_{A^c} f \, d\mu \] But \( \int_A f \, d\mu \ge 0 \) by the positive property and \( \int_{A^c} f \, d\mu = 0 \) by the null property, so \( \int_S f \, d\mu \ge 0 \).
  2. Note first that if \( \mu(A) = 0 \) then both integrals in the displayed equation are 0 so \( \int_S f \, d\mu = 0 \). For the converse, let \( B_n = \left\{x \in S: f(x) \ge \frac{1}{n}\right\} \) for \( n \in \N_+ \) and \( B = \{x \in S: f(x) \gt 0\} \). Then \( B_n \) is increasing in \( n \) and \( \bigcup_{n=1}^\infty B_n = B \). If \( \mu(B) \gt 0 \) then \( \mu(B_n) \gt 0 \) for some \( n \in \N_+ \). But \( f \ge \frac{1}{n} \bs{1}_{B_n} \) on \( A \), so by the increasing property, \( \int_S f \, d\mu = \int_A f \, d\mu \ge \int_A \frac{1}{n} \bs{1}_{B_n} \, d\mu = \frac{1}{n} \mu(B_n) \gt 0 \).

So, if \( f \ge 0 \) almost everywhere on \( S \) then \( \int_S f \, d\mu \gt 0 \) if and only if \( \mu\{x \in S: f(x) \gt 0\} \gt 0 \). The simple extension of the positive property in turn leads to a simple extension of the increasing property.

Suppose that \( f, g: S \to \R \) are measurable functions whose integrals exist, and that \( f \le g \) almost everywhere on \( S \). Then

  1. \( \int_S f \le \int_S g \)
  2. Except in the case that both integrals are \( \infty \) or both \( -\infty \), \( \int_S f \, d\mu = \int_S g \, d\mu \) if and only if \( f = g \) almost everywhere on \( S \).
Proof:
  1. Note that \( g = f + (g - f) \) and \( g - f \ge 0 \) almost everywhere on \( S \). If \( \int_S f \, d\mu = -\infty \) then trivially \( \int_S f \, d\mu \le \int_S g \, d\mu \). Otherwise, by the additive property, \[ \int_S g \, d\mu = \int_S f \, d\mu + \int_S (g - f) \, d\mu \] By the previous result, \(\int_S (g - f) \, d\mu \ge 0 \) so \( \int_S g \, d\mu \ge \int_S f \, d\mu \).
  2. Except in the case that both integrals are \( \infty \) or both are \( -\infty \) we have \[ \int_S g \, d\mu - \int_S f \, d\mu = \int_S (g - f) \, d\mu \] By assumption \( g - f \ge 0 \) almost everywhere on \( S \), and hence by the previous result, the integral on the right is 0 if and only if \( g - f = 0 \) almost everywhere on \( S \).

So if \( f \le g \) almost everywhere on \( S \) then, except in the two cases mentioned, \( \int_S f \, d\mu \lt \int_S g \, d\mu \) if and only if \( \mu\{x \in S: f(x) \lt g(x)\} \gt 0 \). The exclusion when both integrals are \( \infty \) or \( -\infty \) is important; an example is given in the computational exercises below. The next result is the absolute value inequality.

Suppose that \( f: S \to \R \) is a measurable function whose integral exists. Then \[ \left| \int_S f \, d\mu \right| \le \int_S \left|f \right| \, d\mu \] If \( f \) is integrable, then equality holds if and only if \( f \ge 0 \) almost everywhere on \( S \) or \( f \le 0 \) almost everywhere on \( S \).

Proof:

First note that \( -\left|f\right| \le f \le \left|f\right| \) on \( S \). The integrals of all three functions exist, so the increasing property and scaling properties give \[ -\int_S \left|f\right| \, d\mu \le \int_S f \, d\mu \le \int_S \left|f \right| \, d\mu \] which is equivalent to the inequality above. If \( f \) is integrable, then by the previous result, equality holds if and only if \( f = -\left|f\right| \) almost everywhere on \( S \) or \( f = \left|f\right| \) almost everywhere on \( S \). In the first case, \( f \le 0 \) almost everywhere on \( S \) and in the second case, \( f \ge 0 \) almost everywhere on \( S \).

Change of Variables

Suppose that \( (T, \mathscr{T}) \) is another measurable space and that \( u: S \to T \) is measurable. As we saw in our first study of positive measures, \( \nu \) defined by \[ \nu(B) = \mu\left[u^{-1}(B)\right], \quad B \in \mathscr{T} \] is a positive measure on \( (T, \mathscr{T}) \). The following result is known as the change of variables theorem.

If \( f: T \to \R \) is measurable then, assuming that the integrals exist, \[ \int_T f \, d\nu = \int_S (f \circ u) \, d\mu \]

Proof:

We will show that if either of the integrals exist then they both do, and are equal. The proof is a classical bootstrapping argument that parallels the definition of the integral.

  1. Suppose first that \( f \) is a nonnegative simple function on \( T \) with the representation \( f = \sum_{i \in I} b_i \bs{1}_{B_i} \) where \( I \) is a finite index set, \( \{B_i: i \in I\} \) is a measurable partition of \( T \), and \( b_i \in [0, \infty) \) for \( i \in I \). Recall that \( f \circ u \) is a nonnegative simple function on \( S \), with representation \( f \circ u = \sum_{i \in I} b_i \bs{1}_{u^{-1}(B_i)} \). Hence \[ \int_T f \, d\nu = \sum_{i \in I} b_i \nu(B_i) = \sum_{i \in I} b_i \mu\left[u^{-1}(B_i)\right] = \int_S (f \circ u) \, d\mu \]
  2. Next suppose that \( f: T \to [0, \infty) \) is measurable, so that \( f \circ u: S \to [0, \infty) \) is also measurable. There exists an increasing sequence \( (f_1, f_2, \ldots) \) of nonnegative simple functions on \( T \) with \( f_n \to f \) as \( n \to \infty \). Then \((f_1 \circ u, f_2 \circ u, \ldots)\) is an increasing sequence of simple functions on \( S \) with \( f_n \circ u \to f \circ u\) as \( n \to \infty \). By step (a), \( \int_T f_n \, d\nu = \int_S (f_n \circ u) \, d\mu \) for each \( n \in \N_+ \). But by the monotone convergence theorem, \( \int_T f_n \, d\nu \to \int_T f \, d\nu \) as \( n \to \infty \) and \( \int_S (f_n \circ u) \, d\mu \to \int_S (f \circ u) \, d\mu \) so we conclude that \( \int_T f \, d\nu = \int_S (f \circ u) \, d\mu \)
  3. Finally, suppose that \( f: T \to \R \) is measurable, so that \( f \circ u: S \to \R \) is also measurable. Note that \( (f \circ u)^+ = f^+ \circ u \) and \( (f \circ u)^- = f^- \circ u \). By part (b), \begin{align} \int_T f^+ \, d\nu & = \int_S (f^+ \circ u) \, d\mu = \int_S (f \circ u)^+ \, d\mu \\ \int_T f^- \, d\nu & = \int_S (f^- \circ u) \, d\mu = \int_S (f \circ u)^- \, d\mu \end{align} Assuming that at least one of the integrals in the displayed equations is finite, we have \[ \int_T f \, d\nu = \int_T f^+ \, d\nu - \int_T f^- \, d\nu = \int_S (f \circ u)^+ \, d\mu - \int_S (f \circ u)^- \, d\mu = \int_S (f \circ u) \, d\mu\]

The change of variables theorem will look more familiar if we give the variables explicitly. Thus, suppose that we want to evaluate \[ \int_S f\left[u(x)\right] \, d\mu(x) \] where again, \( u: S \to T \) and \( f: T \to \R \). One way is to use the substitution \( u = u(x) \), find the new measure \( \nu \), and then evaluate \[ \int_T g(u) \, d\nu(u) \]

Convergence Properties

We start with a simple but important corollary of the monotone convergence theorem that extends the additivity property to a countably infinite sum of nonnegative functions.

Suppose that \( f_n: S \to [0, \infty) \) is measurable for \( n \in \N_+ \). Then \[ \int_S \sum_{n=1}^\infty f_n \, d\mu = \sum_{n=1}^\infty \int_S f_n \, d\mu \]

Proof:

Let \( g_n = \sum_{i=1}^n f_i \) for \( n \in \N_+ \). Then \( g_n: S \to [0, \infty) \) is measurable and \( g_n \) is increasing in \( n \). Moreover, by definition, \( g_n \to \sum_{i=1}^\infty f_i \) as \( n \to \infty \). Hence by the MCT, \( \int_S g_n \, d\mu \to \int_S \sum_{i=1}^\infty f_i \, d\mu \) as \( n \to \infty \). But we know the additivity property holds for finite sums, so \(\int_S g_n \, d\mu = \sum_{i=1}^n \int_S f_i \, d\mu\) and again, by definition, this sum converges to \(\sum_{i=1}^\infty \int_S f_i \, d\mu\) as \( n \to \infty \).

A similar result below relaxes the assumption that \( f \) be nonnegative, but imposes a stricter integrability requirement. Our next result is the additivity of the integral over a countably infinite collection of disjoint domains.

Suppose that \( f: S \to \R \) is a measurable function whose integral exists, and that \( \{A_n: n \in \N_+\} \) is a disjoint collection of sets in \( \mathscr{S} \). Let \( A = \bigcup_{n=1}^\infty A_n \). Then \[ \int_A f \, d\mu = \sum_{n=1}^\infty \int_{A_n} f \, d\mu \]

Proof:

Suppose first that \( f \) is nonnegative. Note that \( \bs{1}_A = \sum_{n=1}^\infty \bs{1}_{A_n} \) and hence \( \bs{1}_A f = \sum_{n=1}^\infty \bs{1}_{A_n} f \). Thus from the previous theorem, \[ \int_A f \, d\mu = \int_S \bs{1}_A f \, d\mu = \int_S \sum_{n=1}^\infty \bs{1}_{A_n} f \, d\mu = \sum_{n=1}^\infty \int_S \bs{1}_{A_n} f \, d\mu = \sum_{n=1}^\infty \int_{A_n} f \, d\mu \] Suppose now that \( f: S \to \R \) is measurable and \( \int_S f \, d\mu \) exists. Note that for \( B \in \mathscr{S} \), \( \left(\bs{1}_B f\right)^+ = \bs{1}_B f^+ \) and \( \left(\bs{1}_B f\right)^- = \bs{1}_B f^- \). Hence from the previous argument, \[ \int_A f^+ \, d\mu = \sum_{n=1}^\infty \int_{A_n} f^+ \, d\mu, \quad \int_A f^- \, d\mu = \sum_{n=1}^\infty \int_{A_n} f^- \, d\mu \] Both of these are sums of nonnegative terms, and one of the sums, at least, is finite. Hence we can group the terms to get \[ \int_A f \, d\mu = \int_A f^+ \, d\mu - \int_A f^- \, d\mu = \sum_{n=1}^\infty \int_{A_n} (f^+ - f^-) \, d\mu = \sum_{n=1}^\infty \int_{A_n} f \, d\mu \]

Of course, the previous theorem applies if \( f \) is nonnegative or if \( f \) is integrable. Next we give a minor extension of the monotone convergence theorem that relaxes the assumption that the functions be nonnegative.

Suppose that \( f_n: S \to \R \) is a measurable function whose integral exists for each \( n \in \N_+ \) and that \( f_n \) is increasing in \( n \) on \( S \). If \( \int_S f_1 \, d\mu \gt -\infty \) then \[ \int_S \lim_{n \to \infty} f_n \, d\mu = \lim_{n \to \infty} \int_S f_n \, d\mu \]

Proof:

Let \( f(x) = \lim_{n \to \infty} f_n(x) \) for \( x \in S \) which exists in \( \R \cup \{\infty\} \) since \( f_n(x) \) is increasing in \( n \in \N_+ \). If \( \int_S f_1 \, d\mu = \infty \), then by the increasing property, \( \int_S f_n \, d\mu = \infty \) for all \( n \in \N_+ \) and \( \int_S f \, d\mu = \infty \), so the conclusion of the MCT trivially holds. Thus suppose that \( f_1 \) is integrable. Let \( g_n = f_n - f_1 \) for \( n \in \N \) and let \( g = f - f_1 \). Then \( g_n \) is nonnegative and increasing in \( n \) on \( S \), and \( g_n \to g \) as \( n \to \infty \) on \( S \). By the ordinary MCT, \( \int_S g_n \, d\mu \to \int_S g \, d\mu \) as \( n \to \infty \). But since \( \int_S f_1 \, d\mu \) is finite, \( \int_S g_n \, d\mu = \int_S f_n \, d\mu - \int_S f_1 \, d\mu \) and \( \int_S g \, d\mu = \int f \, d\mu - \int_S f_1 \, d\mu \). Again since \( \int_S f_1 \, d\mu \) is finite, it follows that \( \int_S f_n \, d\mu \to \int_S f \, d\mu \) as \( n \to \infty \).

Here is the complementary result for decreasing functions.

Suppose that \( f_n: S \to \R \) is a measurable function whose integral exists for each \( n \in \N_+ \) and that \( f_n \) is decreasing in \( n \) on \( S \). If \( \int_S f_1 \, d\mu \lt \infty \) then \[ \int_S \lim_{n \to \infty} f_n \, d\mu = \lim_{n \to \infty} \int_S f_n \, d\mu \]

Proof:

The functions \( -f_n \) for \( n \in \N_+ \) satisfy the hypotheses of the previous version of the MCT and hence \(\int_S \lim_{n \to \infty} -f_n \, d\mu = \lim_{n \to \infty} -\int_S f_n \, d\mu \). By the scaling property, \( \int_S \lim_{n \to \infty} f_n \, d\mu = \lim_{n \to \infty} \int_S f_n \, d\mu \).

The additional assumptions on the integral of \( f_1 \) in the last two extensions of the monotone convergence theorem are necessary. An example is given in the computational exercises below.

Our next result is also a consequence of the montone convergence theorem, and is called Fatou's lemma in honor of Pierre Fatou. Its usefulness stems from the fact that no assumptions are placed on the integrand functions, except that they be nonnegative and measurable.

Suppose that \( f_n: S \to [0, \infty) \) is measurable for \( n \in \N_+ \). Then \[ \int_S \liminf_{n \to \infty} f_n \, d\mu \le \liminf_{n \to \infty} \int_S f_n \, d\mu \]

Proof:

Let \( g_n = \inf\left\{f_k: k \in \{n, n + 1, \ldots \}\right\} \) for \( n \in \N_+ \). Then \( g_n: S \to [0, \infty) \) is measurable for \( n \in \N_+ \), \( g_n \) is increasing in \( n \), and by definition, \( \lim_{n \to \infty} g_n = \liminf_{n \to \infty} f_n \). By the MCT, \[ \int_S \liminf_{n \to \infty} f_n \, d\mu = \lim_{n \to \infty} \int_S g_n \, d\mu \] But \( g_n \le f_k \) on \( S \) for \( n \in \N_+ \) and \( k \in \{n, n + 1, \ldots\} \) so by the increasing property, \( \int_S g_n \, d\mu \le \int_S f_k \, d\mu\) for \( n \in \N_+ \) and \( k \in \{n, n + 1, \ldots\} \). Hence \( \int_S g_n \, d\mu \le \inf\left\{\int_S f_k \, d\mu: k \in \{n, n+1, \ldots\}\right\} \) for \( n \in \N_+ \) and therefore \[ \lim_{n \to \infty} \int_S g_n \, d\mu \le \liminf_{n \to \infty} \int_S f_n \, d\mu \]

Given the weakness of the hypotheses, it's hardly surprising that strict inequality can easily occur in Fatou's lemma. An example is given in the computational exercise below.

Our next convergence result is one of the most important and is known as the dominated convergence theorem. It's sometimes also known as Lebesgue's dominated convergence theorem in honor of Henri Lebesgue, who first developed all of this stuff in the context of \( \R^n \). The dominated convergence theorem gives a basic condition under which we may interchange the limit and integration operators.

Suppose that \( f_n: S \to \R \) is measurable for \( n \in \N_+ \) and that \( \lim_{n \to \infty} f_n \) exists on \( S \). Suppose also that \( \left|f_n\right| \le g \) for \( n \in \N \) where \( g: S \to [0, \infty) \) is integrable. Then \[ \int_S \lim_{n \to \infty} f_n \, d\mu = \lim_{n \to \infty} \int_S f_n \, d\mu \]

Proof:

First note that by the increasing property, \( \int_S \left|f_n\right| \, d\mu \le \int_S g \, d\mu \lt \infty \) and hence \( f_n \) is integrable for \( n \in \N_+ \). Let \( f = \lim_{n \to \infty} f_n \). Then \( f \) is measurable, and by the increasing property again, \( \int_S \left| f \right| \, d\mu \lt \int_S g \, d\mu \lt \infty \), so \( f \) is integrable.

Now for \( n \in \N_+ \), let \( u_n = \inf\left\{f_k: k \in \{n, n + 1, \ldots\}\right\} \) and let \( v_n = \sup\left\{f_k: k \in \{n, n + 1, \ldots\}\right\} \). Then \( u_n \le f_n \le v_n \) for \( n \in \N_+ \), \( u_n \) is increasing in \( n \), \( v_n \) is decreasing in \( n \), and \( u_n \to f \) and \( v_n \to f \) as \( n \to \infty \). Moreover, \( \int_S u_1 \, d\mu \ge - \int_S g \, d\mu \gt -\infty \) so by the version of the MCT in (11), \( \int_S u_n \, d\mu \to \int_S f \, d\mu \) as \( n \to \infty \). Similarly, \( \int_S v_1 \, d\mu \lt \int_S g \, d\mu \lt \infty \), so by the version of the MCT above, \( \int_S v_n \, d\mu \to \int_S f \, d\mu \) as \( n \to \infty \). But by the increasing property, \( \int_S u_n \, d\mu \le \int_S f_n \, d\mu \le \int_S v_n \, d\mu \) for \( n \in \N_+ \) so by the squeeze theorem for limits, \( \int_S f_n \, d\mu \to \int_S f \, d\mu \) as \( n \to \infty \).

As you might guess, the assumption that \( \left| f_n \right| \) is uniformly bounded in \( n \) by an integrable function is critical. A counterexample when this assumption is missing is given in the computational exercises below. The dominated convergence theorem remains true if \( \lim_{n \to \infty} f_n \) exists almost everywhere on \( S \). The follow corollary of the dominated convergence theorem gives a condition for the interchange of infinite sum and integral.

Suppose that \( f_i: S \to \R \) is measurable for \( i \in \N_+ \) and that \( \sum_{i=1}^\infty \left| f_i \right| \) is integrable. then \[ \int_S \sum_{i=1}^\infty f_i \, d\mu = \sum_{i=1}^\infty \int_S f_i \, d\mu \]

Proof:

The assumption that \( g = \sum_{i=1}^\infty \left| f_i \right| \) is integrable implies that \( g \lt \infty \) almost everywhere on \( S \). In turn, this means that \( \sum_{i=1}^\infty f_i \) is absolutely convergent almost everywhere on \( S \). Let \( f(x) = \sum_{i=1}^\infty f_i(x) \) if \( g(x) \lt \infty \), and for completeness, let \( f(x) = 0 \) if \( g(x) = \infty \). Since only the integral of \( f \) appears in the theorem, it doesn't matter how we define \( f \) on the null set where \( g = \infty \). Now let \( g_n = \sum_{i=1}^n f_i \). Then \( g_n \to f \) as \( n \to \infty \) almost everywhere on \( S \) and \( \left| g_n \right| \le g \) on \( S \). Hence by the dominated convergence theorem, \( \int_S g_n \, d\mu \to \int_S f \, d\mu \) as \( n \to \infty \). But we know the additivity property holds for finite sums, so \( \int_S g_n \, d\mu = \sum_{i=1}^n \int_S f_i \, d\mu \), and in turn this converges to \( \sum_{i=1}^\infty \int_S f_i \, d\mu \) as \( n \to \infty \). Thus we have \( \sum_{i=1}^\infty \int_S f_i \, d\mu = \int_S f \, d\mu \).

The following corollary of the dominated convergence theorem is known as the bounded convergence theorem.

Suppose that \( f_n: S \to \R \) is measurable for \( n \in \N_+ \) and there exists \( A \in \mathscr{S} \) such that \( \mu(A) \lt \infty \), \( \lim_{n \to \infty} f_n \) exists on \( A \), and \( \left| f_n \right| \) is bounded in \( n \in \N_+ \) on \( A \). Then \[ \int_A \lim_{n \to \infty} f_n \, d\mu = \lim_{n \to \infty} \int_A f_n \, d\mu \]

Proof:

Suppose that \( \left|f_n\right| \) is bounded in \( n \) on \( A \) by \( c \in (0, \infty) \). The constant \( c \) is integrable on \( A \) since \( \int_A c \, d\mu = c \mu(A) \lt \infty \), and \( \left|f_n\right| \le c \) on \( A \) for \( n \in \N_+ \). Thus the result follows from the dominated convergence theorem.

Again, the bounded convergence remains true if \( \lim_{n \to \infty} f_n \) exists almost everywhere on \( A \). For a finite measure space (and in particular for a probability space), the condition that \( \mu(A) \lt \infty \) automatically holds.

Product Spaces

Suppose now that \( (S, \mathscr{S}, \mu) \) and \( (T, \mathscr{T}, \nu) \) are \( \sigma \)-finite measure spaces. Please recall the basic facts about the product \( \sigma \)-algebra \( \mathscr{S} \otimes \mathscr{T} \) of subsets of \( S \times T \), and the product measure \( \mu \otimes \nu \) on \( \mathscr{S} \otimes \mathscr{T} \). The product measure space \( (S \times T, \mathscr{S} \otimes \mathscr{T}, \mu \otimes \nu) \) is the standard one that we use for product spaces. If \( f: S \times T \to \R \) is measurable, there are three integrals we might consider. First, of course, is the integral of \( f \) with respect to the product measure \( \mu \otimes \nu \) \[ \int_{S \times T} f(x, y) \, d(\mu \otimes \nu)(x, y) \] sometimes called a double integral in this context. But also we have the nested or iterated integrals where we integrate with respect to one variable at a time: \[ \int_S \left(\int_T f(x, y) \, d\nu(y)\right) \, d\mu(x), \quad \int_T \left(\int_S f(x, y) d\mu(x)\right) \, d\nu(y)\] How are these integrals related? Well, just as in calculus with ordinary Riemann integrals, under mild conditions the three integrals are the same. The resulting important theorem is known as Fubini's Theorem in honor of the Italian mathematician Guido Fubini.

Suppose that \( f: S \times T \to \R \) is measurable. If the double integral on the left exists, then \[ \int_{S \times T} f(x, y) \, d(\mu \otimes \nu)(x, y) = \int_S \int_T f(x, y) \, d\nu(y) \, d\mu(x) = \int_T \int_S f(x, y) \, d\mu(x) \, d\nu(y) \]

Proof:

We will show that \[ \int_{S \times T} f(x, y) \, d(\mu \otimes \nu)(x, y) = \int_S \int_T f(x, y) \, d\nu(y) \, d\mu(x) \] The proof with the other iterated integral is symmetric. The proof proceeds in stages, paralleling the definition of the integral.

Step 1. Suppose that \( f = \bs{1}_{A \times B} \) where \( A \in \mathscr{S} \) and \( B \in \mathscr{T} \). The equation holds by definition of the product measure, since the double integral is \( (\mu \otimes \nu)(A \times B) \) and the iterated integral is \[ \int_S \int_T \bs{1}_{A \times B} (x, y) \, d\nu(y) \, d\nu(x) = \int_S \int_T \bs{1}_A(x) \bs{1}_B(y) \, d\nu(y) \, d\mu(x) \int_S \bs{1}_A(x) \nu(B) \, d\mu = \mu(A) \nu(B) \]

Step 2. Consider \( f = \bs{1}_C \) where \( C \in \mathscr{S} \otimes \mathscr{T} \). The double integral is \( (\mu \otimes \nu)(C) \), and so as a function of \( C \in \mathscr{S} \otimes \mathscr{T} \) defines the measure \( \mu \otimes \nu \). On the other hand, the iterated integral is \[ \int_S \int_T \bs{1}_C(x, y) \, d\nu(y) \, d\mu(x) = \int_S \int_T \bs{1}_{C_x}(y) \, d\nu(y) \, d\mu(x) = \int_S \nu(C_x) \, d\mu(x) \] where \( C_x = \{y \in T: (x, y) \in C\} \) is the cross-section of \( C \) at \( x \in S \). Recall that \( x \mapsto \nu(C_x) \) is a nonnegative, measurable function of \( x \), so \( C \mapsto \int_S \nu(C_x) \, d\mu(x) \) makes sense. Moreover, as a function of \( C \in \mathscr{S} \otimes \mathscr{T} \), this integral also forms a measure: If \( \{C^i: i \in I\} \) is a countable, disjoint collection sets in \( \mathscr{S} \otimes \mathscr{T} \), then \( \{C_x^i: i \in I\} \) is a countable, disjoint collection of sets in \( \mathscr{T} \). Cross-sections preserve set operations, so if \( C = \bigcup_{i \in I} C^i \) then \( C_x = \bigcup_{i \in I} C_x^i \). By the additivity of the measure \( \nu \) and the integral we have \[ \int_S \nu(C_x) \, d\mu(x) = \int_S \nu\left(\bigcup_{i \in I} C_x^i \right) \, d\mu(x) = \int_S \sum_{i \in I} \nu\left(C_x^i\right) \, d\mu(x) = \sum_{i \in I} \int_S \nu\left(C_x^i\right) \, d\mu(x)\] To summarize, the double integral and the iterated integral define positive measures on \( \mathscr{S} \otimes \mathscr{T} \). By step 1, these measure agree on the measurable rectangles. By the uniqueness theorem, they must be the same measure. Thus the double integral and the iterated integral agree with integrand \( f = \bs{1}_C \) for every \( C \in \mathscr{S} \otimes \mathscr{T} \).

Step 3. Suppose \( f = \sum_{i \in I} c_i \bs{1}_{C_i} \) is a nonnegative simple function on \( S \times T \). Thus, \( I \) is a finite index set, \( c_i \in [0, \infty) \) for \( i \in I \), and \( \{C_i: i \in I\} \) is a disjoint collection of sets in \( \mathscr{S} \otimes \mathscr{T} \). The double integral and the iterated integral satisfy the linearity properties, and hence by step 2, agree with integrand \( f \).

Step 4. Suppose that \( f: S \to [0, \infty) \) is measurable. Then there exists a sequence of nonnegative simple functions \( g_n, \; n \in \N_+ \) such that \( g_n \) is increasing in \( n \in \N_+ \) on \( S \times T \), and \( g_n \to f \) as \( n \to \infty \) on \( S \times T \). By the monotone convergence theorem, \( \int_{S \times T} g_n \, d(\mu \otimes \nu) \to \int_{S \times T} f \, d(\mu \otimes \nu) \). But for fixed \( x \in S \), \( y \mapsto g_n(x, y) \) is increasing in \( n \) on \( T \) and has limit \( f(x, y) \) as \( n \to \infty \). By another application of the montone convergence theorem, \( \int_T g_n(x, y) \, d\nu(y) \to \int_T f(x, y) \, d\nu(y) \) as \( n \to \infty \). But \(x \mapsto \int_T g_n(x, y) \, d\nu(y) \) is measurable and is increasing in \( n \in \N_+ \) on \( S \), so by yet another application of the monotone convergence theorem, \( \int_S \int_T g_n(x, y) \, d\nu(y) \, d\mu(x) \to \int_S \int_T f(x, y) \, d\nu(y) \, d\mu(x) \) as \( n \to \infty \). But the double integral and the iterated integral agree with integrand \( g_n \) for each \( n \in \N_+ \), so it follows that the double integral and the iterated integral agree with integrand \( f \).

Step 5. Suppose that \( f: S \times T \to \R \) is measurable. By step 4, the double integral and the iterated integral agree with integrand functions \( f^+ \) and \( f^- \). Assuming that at least one of these is finite, then by the additivity property, they agree with integrand function \( f = f^+ - f^- \).

Of course, the double integral exists, and so Fubini's theorem applies, if either \( f \) is nonnegative or integrable with respect to \( \mu \otimes \nu \). When \( f \) is nonnegative, the result is sometimes called Tonelli's theorem in honor of another Italian mathematician, Leonida Tonelli. On the other hand, the iterated integrals may exist, and may be different, when the double integral does not exist. Counterexamples are given in the computational exercises below.

A special case of Fubini's theorem (and indeed part of the proof) is that we can compute the measure of a set in the product space by integrating the cross-sectional measures.

If \( C \in \mathscr{S} \otimes \mathscr{T} \) then \[ (\mu \otimes \nu)(C) = \int_S \nu\left(C_x\right) \, d\mu(x) = \int_T \mu\left(C^y\right) \, d\nu(y) \] where \( C_x = \{y \in T: (x, y) \in C\} \) for \( x \in S \), and \( C^y = \{x \in S: (x, y) \in C\} \) for \( y \in T \).

In particular, if \( C, \; D \in \mathscr{S} \otimes \mathscr{T} \) have the property that \( \nu(C_x) = \nu(D_x) \) for all \( x \in S \), or \( \mu\left(C^y\right) = \mu\left(D^y\right) \) for all \( y \in T \) (that is, \( C \) and \( D \) have the same cross-sectional measures with respect to one of the variables), then \( (\mu \otimes \nu)(C) = (\mu \otimes \nu)(D) \). In \( \R^2 \) with area, and in \( \R^3 \) with volume (Lebesgue measure in both cases), this is known as Cavalieri's principle, named for Bonaventura Cavalieri, yet a third Italian mathematician. Clearly, Italian mathematicians cornered the market on theorems of this sort.

A simple corollary of Fubini's theorem is that the double integral of a product function over a product set is the product of the integrals. This result has important applications to independent random variables.

Suppose that \( g: S \to \R \) and \( h: T \to \R \) are measurable, and are either nonnegative or integrable with respect to \( \mu \) and \( \nu \), respectively. Then \[ \int_{S \times T} g(x) h(y) d(\mu \otimes \nu)(x, y) = \left(\int_S g(x) \, d\mu(x)\right) \left(\int_T h(y) \, d\nu(y)\right) \]

Recall that if the measure space is countable, with the all subsets measurable, and with counting measure, then integrals are simply sums. In this case Fubini's theorem allows us to rearrange the order of summation in a double sum.

Suppose that \( I \) and \( J \) are countable and that \( a_{i j} \in \R \) for \( i \in I \) and \( j \in J \). If the sum of the positive terms or the sum of the negative terms is finite, then \[ \sum_{(i, j) \in I \times J} a_{i j} = \sum_{i \in I} \sum_{j \in J} a_{i j} = \sum_{j \in J} \sum_{i \in I} a_{i j} \]

Often \( I = J = \N_+ \), and in this case, \( a_{i j} \) can be viewed as an infinite array, with \( i \in \N_+ \) the row number and \( j \in \N_+ \) the column number:

\( a_{11} \) \( a_{12} \) \( a_{13} \) \( \ldots \)
\( a_{21} \) \( a_{22} \) \( a_{23} \) \( \ldots \)
\( a_{31} \) \( a_{32} \) \( a_{33} \) \( \ldots \)
\( \vdots \) \( \vdots \) \( \vdots \) \( \vdots \)

The significant point is that \( \N_+ \) is totally ordered. While there is no implied order of summation in the double sum \( \sum_{(i, j) \in \N_+^2} a_{i j} \), the iterated sum \( \sum_{i=1}^\infty \sum_{j=1}^\infty a_{i j} \) is obtained by summing over the rows in order and then summing the results by column in order, while the iterated sum \( \sum_{j=1}^\infty \sum_{i=1}^\infty a_{i j} \) is obtained by summing over the columns in order and then summing the results by row in order.

Of course, only one of the product spaces might be countable. The theorem above which gives conditions for the interchange of sum and integral can be viewed as an application of Fubini's theorem, where one of the measure spaces is \( (S, \mathscr{S}, \mu) \) and the other is \( \N_+ \) with counting measure.

Examples and Applications

Probability Spaces

Suppose that \( (\Omega, \mathscr{F}, \P) \) is a probability space, so that \( \Omega \) is the sample space of a random experiment, \( \mathscr{F} \) is the \( \sigma \)-algebra of events, and \( \P \) is a probability measure. Suppose also that \( (S, \mathscr{S}) \) is another measurable space, and that \( X \) is a random variable for the experiment, taking values in \( S \). Of course, this simply means that \( X \) is a measurable function from \( \Omega \) to \( S \). Recall that the probability distribution of \( X \) is the probability measure \( P_X \) on \( (S, \mathscr{S}) \) defined by \[ P_X(A) = \P(X \in A), \quad A \in \mathscr{S} \] Since \( \{X \in A\} \) is just probability notation for the inverse image of \( A \) under \( X \), \( P_X \) is simply a special case of constructing a new positive measure from a given positive measure via a change of variables. Suppose now that \( r: S \to \R \) is measurable, so that \(r(X) \) is a real-valued random variable. The integral of \( r(X) \) (assuming that it exists) is known as the expected value of \( r(X) \) and is of fundamental importance. We will study expected values in detail in the next chapter. Here, we simply note different ways to write the integral. By the change of variables formula for integrals, we have \[ \int_\Omega r\left[X(\omega)\right] \, d\P(\omega) = \int_S r(x) \, dP_X(x) \] Now let \( F_Y \) denote the distribution function of \( Y = r(X) \). By another change of variables, \( Y \) has a probability distribution \( P_Y \) on \( \R \), which is also a Lebesgue-Stieltjes measure, named for Henri Lebesgue and Thomas Stiletjes. Recall that this probability measure is characterized by \[ P_Y(a, b] = \P(a \lt Y \le b) = F_Y(b) - F_Y(a); \quad a, \, b \in \R, \; a \lt b \] With another application of our change of variables theorem, we can add to our chain of integrals: \[ \int_\Omega r\left[X(\omega)\right] \, d\P(\omega) = \int_S r(x) \, dP_X(x) = \int_\R y \, dP_Y(y) = \int_\R y \, dF_Y(y) \] Of course, the last two integrals are simply different notations for exactly the same thing. In the section on absolute continuity and density functions, we will see other ways to write the integral.

Counterexamples

Consider the space \( (\R, \mathscr{R}, \lambda) \) where \( \mathscr{R} \) is the usual \( \sigma \)-algebra of Lebesgue measurable sets and \( \lambda \) is Lebesgue measure. Let \( f = \bs{1}_{[1, \infty)} \) and \( g = \bs{1}_{[0, \infty)} \). Show that

  1. \( f \le g \) on \( \R \)
  2. \( \lambda\{x \in \R: f(x) \lt g(x)\} = 1 \)
  3. \( \int_\R f \, d\lambda = \int_\R g \, d\lambda = \infty \)

This example shows that the strict increasing property can fail when the integrals are infinite.

Consider the space \( (\R, \mathscr{R}, \lambda) \) where \( \mathscr{R} \) is the usual \( \sigma \)-algebra of Lebesgue measurable sets and \( \lambda \) is Lebesgue measure. Let \( f_n = \bs{1}_{[n, \infty)} \) for \( n \in \N_+ \). Show that

  1. \( f_n \) is decreasing in \( n \in \N_+ \) on \( \R \).
  2. \( f_n \to 0 \) as \( n \to \infty \) on \( \R \).
  3. \( \int_\R f_n \, d\lambda = \infty \) for each \( n \in \N_+ \).

This example shows that the monotone convergence theorem can fail if the first integral is infinite. It also illustrates strict inequality in Fatou's lemma.

Consider the space \( (\R, \mathscr{R}, \lambda) \) where \( \mathscr{R} \) is the usual \( \sigma \)-algebra of Lebesgue measurable sets and \( \lambda \) is Lebesgue measure. Let \( f_n = \bs{1}_{[n, n + 1]} \) for \( n \in \N_+ \). Show that

  1. \(\lim_{n \to \infty} f_n = 0 \) on \( \R \) so \( \int_\R \lim_{n \to \infty} f_n \, d\mu = 0 \)
  2. \( \int_\R f_n \, d\lambda = 1 \) for \( n \in \N_+ \) so \( \lim_{n \to \infty} \int_\R f_n \, d\lambda = 1\)
  3. \( \sup\{f_n: n \in \N_+\} = \bs{1}_{[1, \infty)} \) on \( \R \)

This example shows that the dominated convergence theorem can fail if \( \left|f_n\right| \) is not bounded by an integrable function. It also shows that strict inequality can hold in Fatou's lemma.

Consider the product space \( [0, 1]^2 \) with the usual Lebesgue measurable subsets and Lebesgue measure. Let \( f: [0, 1]^2 \to \R \) be defined by \[ f(x, y) = \frac{x^2 - y^2}{(x^2 + y^2)^2} \] Show that

  1. \( \int_{[0, 1]^2} f(x, y) \, d(x, y) \) does not exist.
  2. \( \int_0^1 \int_0^1 f(x, y) \, dx \, dy = -\frac{\pi}{4} \)
  3. \( \int_0^1 \int_0^1 f(x, y) \, dy \, dx = \frac{\pi}{4} \)

This example shows that the iterated integrals can exist and be different when the double integral does not exist.

For \( i, \; j \in \N_+ \) define the sequence \( a_{i j} \) as follows: \( a_{i i} = 1 \) and \( a_{i + 1, i} = -1 \) for \( i \in \N_+ \), \( a_{i j} = 0 \) otherwise.

  1. Give \( a_{i j} \) in array form with \( i \in \N_+ \) as the row number and \( j \in \N_+ \) as the column number
  2. Show that \( \sum_{(i, j) \in \N_+^2} a_{i j} \) does not exist
  3. Show that \( \sum_{i = 1}^\infty \sum_{j = 1}^\infty a_{i j} = 1 \)
  4. Show that \( \sum_{j=1}^\infty \sum_{i=1}^\infty a_{i j} = 0 \)

This example shows that the iterated sums can exist and be different when the double sum does not exist.

Computational Exercises

Compute \( \int_D f(x, y) \, d(x,y) \) in each case below for the given \( D \subseteq \R^2 \) and \( f: D \to \R \).

  1. \( f(x, y) = e^{-2 x} e^{-3 y} \), \( D = [0, \infty) \times [0, \infty) \)
  2. \(f(x, y) = e^{-2 x} e^{-3 y} \), \( D = \{(x, y) \in \R^2: 0 \le x \le y \lt \infty\} \)

Integrals of the type in the last exercise are useful in the study of exponential distributions.