Existence and Uniqueness

Suppose that \( S \) is a set and \( \ms S \) a \( \sigma \)-algebra of subsets of \( S \), so that \( (S, \ms S) \) is a measurable space. In many cases, it is impossible to define a positive measure \(\mu\) on \(\ms S\) explicitly, by giving a formula for computing \(\mu(A)\) for each \(A \in \ms S\). Rather, we often know how the measure \(\mu\) should work on some class of sets \(\ms B\) that generates \( \ms S \). We would then like to know that \(\mu\) can be extended to a positive measure on \(\ms S\), and that this extension is unique. The purpose of this section is to discuss the basic results on this topic. The section on special set structures will play an important role. If you are not interested in questions of existence and uniqueness of positive measures, you can safely skip this section.

Basic Theory

Positive Measures on Algebras

Suppose first that \( \ms A \) is an algebra of subsets of \(S\). Recall that this means that \( \ms A \) is a collection of subsets that contains \(S\) and is closed under complements and finite unions (and hence also finite intersections). Here is our first definition:

A positive measure on \(\ms A\) is a function \( \mu: \ms A \to [0, \infty] \) that satisfies the following properties:

\( \mu(\emptyset) = 0 \)
If \( \{A_i: i \in I\} \) is a countable, disjoint collection of sets in \( \ms A \) and if \( \bigcup_{i \in I} A_i \in \ms A \) then \[ \mu\left(\bigcup_{i \in I} A_i\right) = \sum_{i \in I} \mu(A_i) \]

Clearly the definition of a positive measure on an algebra is very similar to the definition for a \( \sigma \)-algebra. If the collection of sets in (b) is finite, then \( \bigcup_{i \in I} A_i \) must be in the algebra \( \ms A \). Thus, \( \mu \) is finitely additive. If the collection is countably infinite, then there is no guarantee that the union is in \( \ms A \). If it is however, then \( \mu \) must be additive over this collection. Given the similarity, it is not surprising that \( \mu \) shares many of the basic properties of a positive measure on a \( \sigma \)-algebra, with proofs that are almost identical.

If \( A, \, B \in \ms A \), then \( \mu(B) = \mu(A \cap B) + \mu(B \setminus A) \).

Details:

Note that \( B = (A \cap B) \cup (B \setminus A) \), and the sets in the union are in the algebra \( \ms A \) and are disjoint.

If \( A, \, B \in \ms A \) and \( A \subseteq B \) then

\( \mu(B) = \mu(A) + \mu(B \setminus A) \)
\( \mu(A) \le \mu(B) \)

Details:

Part (a) follows from , since \( A \cap B = A \). Part (b) follows from part (a).

Thus \( \mu \) is increasing, relative to the subset partial order \( \subseteq \) on \( \ms A \) and the ordinary order \( \le \) on \( [0, \infty] \), The following results are simple corollaries if . Parts (a) is the difference rule, part (b) is the proper difference rules, and part (c) is the complement rule.

Suppose that \(A, \, B \in \ms S\).

If \(\mu(B) \lt \infty\) then \(\mu(B \setminus A) = \mu(B) - \mu(A \cap B)\)
If \(\mu(B) \lt \infty\) and \(A \subseteq B\) then \(\mu(B \setminus A) = \mu(B) - \mu(A)\)
If \(\mu\) is a finite measure then \(\mu(A^c) = \mu(S) - \mu(A)\).

The following result is the subadditive property for a positive measure \( \mu \) on an algebra \( \ms A \).

Suppose that \( \{A_i: i \in I \}\) is a countable collection of sets in \( \ms A \) and that \( \bigcup_{i \in I} A_i \in \ms A \). Then \[ \mu\left(\bigcup_{i \in I} A_i \right) \le \sum_{i \in I} \mu(A_i) \]

Details:

The proof is just like before. Assume that \( I = \N_+ \). Let \( B_1 = A_1 \) and \( B_i = A_i \setminus (A_1 \cup \ldots \cup A_{i-1}) \) for \( i \in \{2, 3, \ldots\} \). Then \( \{B_i: i \in I\} \) is a disjoint collection of sets in \( \ms A \) with the same union as \( \{A_i: i \in I\} \). Also \( B_i \subseteq A_i \) for each \( i \) so \( \mu(B_i) \le \mu(A_i) \). Hence if the union is in \( \ms A \) then \[ \mu\left(\bigcup_{i \in I} A_i \right) = \mu\left(\bigcup_{i \in I} B_i \right) = \sum_{i \in I} \mu(B_i) \le \sum_{i \in I} \mu(A_i) \]

For a finite union of sets with finite measure, the inclusion-exclusion formula holds, and the proof is just like the one for a measure space.

Suppose that \(\{A_i: i \in I\}\) is a finite collection of sets in \( \ms A \) where \(\#(I) = n \in \N_+\), and that \( \mu(A_i) \lt \infty \) for \( i \in I \). Then \[\mu \left( \bigcup_{i \in I} A_i \right) = \sum_{k = 1}^n (-1)^{k - 1} \sum_{J \subseteq I, \; \#(J) = k} \mu \left( \bigcap_{j \in J} A_j \right)\]

The continuity theorems hold for a positive measure \( \mu \) on an algebra \( \ms A \), just as for a positive measure on a \( \sigma \)-algebra, assuming that the appropriate union and intersection are in the algebra. The proofs are just as before.

Suppose that \( (A_1, A_2, \ldots) \) is a sequence of sets in \( \ms A \).

If the sequence is increasing, so that \( A_n \subseteq A_{n+1} \) for each \( n \in \N_+ \), and \( \bigcup_{i = 1}^\infty A_i \in \ms A \), then \( \mu\left(\bigcup_{i=1}^\infty A_i \right) = \lim_{n \to \infty} \mu(A_n) \).
If the sequence is decreasing, so that \( A_{n+1} \subseteq A_n \) for each \( n \in \N_+ \), and \( \mu(A_1) \lt \infty \) and \( \bigcap_{i=1}^\infty A_i \in \ms A \), then \( \mu\left(\bigcap_{i=1}^\infty A_i \right) = \lim_{n \to \infty} \mu(A_n) \).

Details:

Note that if \( \mu(A_k) = \infty \) for some \( k \) then \( \mu(A_n) = \infty \) for \( n \ge k \) and \( \mu\left(\bigcup_{i=1}^\infty A_i \right) = \infty \) if this union is in \( \ms A \). Thus, suppose that \( \mu(A_i) \lt \infty \) for each \( i \). Let \( B_1 = A_1 \) and \( B_i = A_i \setminus A_{i-1} \) for \( i \in \{2, 3, \ldots\} \). Then \( (B_1, B_2, \ldots) \) is a disjoint sequence in \( \ms A \) with the same union as \( (A_1, A_2, \ldots) \). Also, \( \mu(B_1) = \mu(A_1) \) and \( \mu(B_i) = \mu(A_i) - \mu(A_{i-1}) \) for \( i \in \{2, 3, \ldots\} \). Hence if the union is in \( \ms A \), \[ \mu\left(\bigcup_{i=1}^\infty A_i \right) = \mu \left(\bigcup_{i=1}^\infty B_i \right) = \sum_{i=1}^\infty \mu(B_i) = \lim_{n \to \infty} \sum_{i=1}^n \mu(B_i) \] But \( \sum_{i=1}^n \mu(B_i) = \mu(A_1) + \sum_{i=2}^n [\mu(A_i) - \mu(A_{i-1})] = \mu(A_n) \).
Note that \( A_1 \setminus A_n \in \ms A \) and this sequence is increasing. Moreover, \( \bigcup_{n=1}^\infty (A_1 \setminus A_n) = \left(\bigcap_{n=1}^\infty A_n \right)^c \cap A_1 \). Hence if \( \bigcap_{n=1}^\infty A_n \in \ms A \) then \( \bigcup_{n=1}^\infty (A_1 \setminus A_n) \in \ms A \). Thus using the continuity result for increasing sets, \begin{align} \mu \left(\bigcap_{i=1}^\infty A_i \right) & = \mu\left[A_1 \setminus \bigcup_{i=1}^\infty (A_1 \setminus A_i) \right] = \mu(A_1) - \mu\left[\bigcup_{i=1}^\infty (A_1 \setminus A_n)\right]\\ & = \mu(A_1) - \lim_{n \to \infty} \mu(A_1 \setminus A_n) = \mu(A_1) - \lim_{n \to \infty} [\mu(A_1) - \mu(A_n)] = \lim_{n \to \infty} \mu(A_n) \end{align}

Recall that if the sequence \( (A_1, A_2, \ldots) \) is increasing, then we define \( \lim_{n \to \infty} A_n = \bigcup_{n=1}^\infty A_n \), and if the sequence is decreasing then we define \( \lim_{n \to \infty} A_n = \bigcap_{n=1}^\infty A_n \). Thus the conclusion of both parts of the continuity theorem is \[ \P\left(\lim_{n \to \infty} A_n\right) = \lim_{n \to \infty} \P(A_n) \] Finite additivity and continuity for increasing events imply countable additivity:

If \( \mu: \ms A \to [0, \infty] \) satisfies the properties below then \( \mu \) is a positive measure on \( \ms A \).

\( \mu(\emptyset) = 0 \)
\( \mu\left(\bigcup_{i \in I} A_i\right) = \sum_{i \in I} \mu(A_i) \) if \( \{A_i: i \in I\} \) is a finite disjoint collection of sets in \( \ms A \)
\( \mu\left(\bigcup_{i=1}^\infty A_i \right) = \lim_{n \to \infty} \mu(A_n) \) if \( (A_1, A_2, \ldots) \) is an increasing sequence of events in \( \ms A \) and \( \bigcup_{i=1}^\infty A_i \in \ms A \).

Details:

All that is left to prove is additivitiy over a countably infinite collection of sets in \( \ms A \) when the union is also in \( \ms A \). Thus suppose that \(\{A_n: n \in \N\} \) is a disjoint collection of sets in \( \ms A \) with \( \bigcup_{n=1}^\infty A_n \in \ms A \). Let \( B_n = \bigcup_{i=1}^n A_i \) for \( n \in \N_+ \). Then \( B_n \in \ms A \) and \( \bigcup_{n=1}^\infty B_n = \bigcup_{n=1}^\infty A_n \). Hence using the finite additivity and the continuity property we have \[ \P\left(\bigcup_{n = 1}^\infty A_n\right) = \P\left(\bigcup_{n=1}^\infty B_n\right) = \lim_{n \to \infty} \P(B_n) = \lim_{n \to \infty} \sum_{i=1}^n \P(A_i) = \sum_{i=1}^\infty \P(A_i) \]

Many of the basic theorems in measure theory require that the measure not be too far removed from being finite. This leads to the following definition, which is just like the one for a positive measure on a \( \sigma \)-algebra.

A measure \( \mu \) on an algebra \( \ms A \) of subsets of \( S \) is \( \sigma \)-finite if there exists a sequence of sets \( (A_1, A_2, \ldots) \) in \( \ms A \) such that \( \bigcup_{n=1}^\infty A_n = S \) and \( \mu(A_n) \lt \infty \) for each \( n \in \N_+ \). The sequence is called a \( \sigma \)-finite sequence for \( \mu \).

Suppose that \( \mu \) is a \( \sigma \)-finite measure on an algebra \( \ms A \) of subsets of \( S \).

There exists an increasing \( \sigma \)-finite sequence.
There exists a disjoint \( \sigma \)-finite sequence.

Details:

We use the same tricks that we have used before. Suppose that \( (A_1, A_2, \ldots) \) is a \( \sigma \)-finite sequence for \( \mu \).

Let \( B_n = \bigcup_{i = 1}^n A_i \). Then \( B_n \in \ms A \) for \( n \in \N_+ \) and this sequence is increasing. Moreover, \( \mu(B_n) \le \sum_{i=1}^n \mu(A_i) \lt \infty \) for \( n \in \N_+ \) and \( \bigcup_{n=1}^\infty B_n = \bigcup_{n=1}^\infty A_n = S \).
Let \( C_1 = A_1 \) and let \( C_n = A_n \setminus \bigcup_{i=1}^{n-1} A_i \) for \( n \in \{2, 3, \ldots\} \). Then \( C_n \in \ms A \) for each \( n \in \N_+ \) and this sequence is disjoint. Moreover, \( C_n \subseteq A_n \) so \( \mu(C_n) \le \mu(A_n) \lt \infty \) and \( \bigcup_{n=1}^\infty C_n = \bigcup_{n=1}^\infty A_n = S \).

Extension and Uniqueness Theorems

The fundamental theorem on measures states that a positive, \( \sigma \)-finite measure \( \mu \) on an algebra \( \ms A \) can be uniquely extended to \( \sigma(\ms A) \). The extension part is sometimes referred to as the Carathéodory extension theorem, and is named for the Greek mathematician Constantin Carathéodory.

If \( \mu \) is a positive, \( \sigma \)-finte measure on an algebra \(\ms A\), then \( \mu \) can be extended to a positive measure on \( \ms S = \sigma(\ms A) \).

Details:

The proof is complicated, but here is a broad outline. First, for \( A \subseteq S \), we define a cover of \( A \) to be a countable collection \( \{A_i: i \in I\} \) of sets in \( \ms A \) such that \( A \subseteq \bigcup_{i \in I} A_i \). Next, we define a new set function \( \mu^* \), the outer measure, on all subsets of \( S \): \[ \mu^*(A) = \inf \left\{ \sum_{i \in I} \mu(A_i): \{A_i: i \in I\} \text{ is a cover of } A \right\}, \quad A \subseteq S \] Outer measure satifies the following properties.

\( \mu^*(A) \ge 0 \) for \( A \subseteq S \), so \( \mu^* \) is nonnegative.
\( \mu^*(A) = \mu(A) \) for \( A \in \ms A \), so \( \mu^* \) extends \( \mu \).
If \( A \subseteq B \) then \( \mu^*(A) \le \mu^*(B) \), so \( \mu^* \) is increasing
If \( A_i \subseteq S \) for each \( i \) in a countable index set \( I \) then \( \mu^*\left(\bigcup_{i \in I} A_i\right) \le \sum_{i \in I} \mu^*(A_i) \), so \( \mu^* \) is countably subadditive.

Next, \( A \subseteq S \) is said to be measurable if \[ \mu^*(B) = \mu^*(B \cap A) + \mu^*(B \setminus A), \quad B \subseteq S \] Thus, \( A \) is measurable if \( \mu^* \) is additive with respect to the partition of \( B \) induced by \( \{A, A^c\} \), for every \( B \subseteq S \). We let \( \ms{M} \) denote the collection of measurable subsets of \( S \). The proof is finished by showing that \( \ms A \subseteq \ms{M} \), \( \ms{M} \) is a \( \sigma \)-algebra of subsets of \( S \), and \( \mu^* \) is a positive measure on \( \ms{M} \). It follows that \( \sigma(\ms A) = \ms S \subseteq \ms{M} \) and hence \( \mu^* \) is a measure on \( \ms S \) that extends \( \mu \)

Our next goal is the basic uniqueness result, which serves as the complement to the basic extension result. But first we need another variation of the term \( \sigma \)-finite.

Suppose that \( \mu \) is a measure on a \( \sigma \)-algebra \( \ms S \) of subsets of \( S \) and \( \ms B \subseteq \ms S \). Then \( \mu \) is \( \sigma \)-finite on \( \ms B \) if there exists a countable collection \( \{B_i: i \in I\} \subseteq \ms B \) such that \( \mu(B_i) \lt \infty \) for \( i \in I \) and \( \bigcup_{i \in I} B_i = S \).

The next result is the uniqueness theorem. The proof, like others that we have seen, uses Dynkin's \( \pi \)-\( \lambda \) theorem, named for Eugene Dynkin.

Suppose that \( \ms B \) is a \( \pi \)-system and that \( \ms S = \sigma(\ms B) \). If \( \mu_1 \) and \( \mu_2 \) are positive measures on \( \ms S \) and are \( \sigma \)-finite on \( \ms B \), and if \( \mu_1(A) = \mu_2(A) \) for all \( A \in \ms B \), then \( \mu_1(A) = \mu_2(A) \) for all \( A \in \ms S \).

Details:

Suppose that \( B \in \ms B \) and that \( \mu_1(B) = \mu_2(B) \lt \infty \). Let \( \ms{L}_B = \{A \in \ms S: \mu_1(A \cap B) = \mu_2(A \cap B) \} \). Then \( S \in \ms{L}_B \) since \( \mu_1(B) = \mu_2(B) \). If \( A \in \ms{L}_B \) then \( \mu_1(A \cap B) = \mu_2(A \cap B) \) so \( \mu_1(A^c \cap B) = \mu_1(B) - \mu_1(A \cap B) = \mu_2(B) - \mu_2(A \cap B) = \mu_2(A^c \cap B) \) and hence \( A^c \in \ms{L}_B \). Finally, suppose that \( \{A_j: j \in J\} \) is a countable, disjoint collection of events in \( \ms{L}_B \). Then \( \mu_1(A_j \cap B) = \mu_2(A_j \cap B) \) for each \( j \in J \) and hence \begin{align} \mu_1\left[ \left(\bigcup_{j \in J} A_j \right) \cap B \right] & = \mu_1 \left(\bigcup_{j \in J} (A_j \cap B) \right) = \sum_{j \in J} \mu_1(A_j \cap B) \\ & = \sum_{j \in J} \mu_2(A_j \cap B) = \mu_2\left(\bigcup_{j \in J} (A_j \cap B) \right) = \mu_2 \left[ \left(\bigcup_{j \in J} A_j \right) \cap B \right] \end{align} Therefore \( \bigcup_{j \in J} A_j \in \ms{L}_B \), and so \( \ms{L}_B \) is a \( \lambda \)-system. By assumption, \( \ms B \subseteq \ms{L}_B \) and therefore by the \( \pi \)-\( \lambda \) theorem, \( \ms S = \sigma(\ms B) \subseteq \ms{L}_B \).

Next, by assumption there exists \( B_i \in \ms B \) with \( \mu_1(B_i) = \mu_2(B_i) \lt \infty \) for each \( i \in \N_+ \) and \( S = \bigcup_{i=1}^\infty B_i \). If \( A \in \ms S \) then the inclusion-exclusion rule can be applied to \[ \mu_k\left[\left(\bigcup_{i=1}^n B_i\right) \cap A \right] = \mu_k\left[\bigcup_{i=1}^n (A \cap B_i) \right] \] where \( k \in \{1, 2\} \) and \( n \in \N_+ \). But the inclusion-exclusion formula only has terms of the form \( \mu_k \left[ \bigcap_{j \in J} (A \cap B_j) \right] = \mu_k \left[ A \cap \left(\bigcap_{j \in J} B_j\right) \right] \) where \( J \subseteq \{1, 2, \ldots, n\} \). But \( \bigcap_{j \in J} B_j \in \ms B \) since \( \ms B \) is a \( \pi \)-system, so by the previous paragraph, \( \mu_1 \left[ \bigcap_{j \in J} (A \cap B_j) \right] = \mu_2 \left[ \bigcap_{j \in J} (A \cap B_j) \right] \). It then follows that for each \( n \in \N_+ \) \[ \mu_1\left[\left(\bigcup_{i=1}^n B_i\right) \cap A \right] = \mu_2\left[\left(\bigcup_{i=1}^n B_i\right) \cap A \right] \] Finally, letting \( n \to \infty \) and using the continuity theorem for increasing sets gives \( \mu_1(A) = \mu_2(A) \).

An algebra \( \ms A \) of subsets of \( S \) is trivially a \( \pi \)-system. Hence, if \( \mu_1 \) and \( \mu_2 \) are positive measures on \( \ms S = \sigma(\ms A) \) and are \( \sigma \)-finite on \( \ms A \), and if \( \mu_1(A) = \mu_2(A) \) for \( A \in \ms A \), then \( \mu_1(A) = \mu_2(A) \) for \( A \in \ms S \). This completes the second part of the fundamental theorem.

Of course, the results of this subsection hold for probability measures. Formally, a probability measure \( \P \) on an algebra \( \ms A \) of subsets of \( S \) is a positive measure on \( \ms A \) with the additional requirement that \( \P(S) = 1 \). Probability measures are trivially \( \sigma \)-finite, so a probability measure \( \P \) on an algebra \( \ms A \) can be uniquely extended to \( \ms S = \sigma(\ms A) \).

However, usually we start with a collection that is more primitive than an algebra. The next result combines the definition with the main theorem associated with the definition. For a proof see the section on special set structures.

Suppose that \( \ms B \) is a nonempty collection of subsets of \( S \) and let \[ \ms A = \left\{\bigcup_{i \in I} B_i: \{B_i: i \in I\} \text{ is a finite, disjoint collection of sets in } \ms B\right\} \] If the following conditions are satisfied, then \( \ms B \) is a semi-algebra of subsets of \( S \), and then \( \ms A \) is the algebra generated by \(\ms B\).

If \( B_1, \, B_2 \in \ms B \) then \( B_1 \cap B_2 \in \ms B \).
If \( B \in \ms B \) then \( B^c \in \ms A \).

Suppose now that we know how a measure \( \mu \) should work on a semi-algebra \( \ms B \) that generates an algebra \( \ms A \) and then a \( \sigma \)-algebra \( \ms S = \sigma(\ms A) = \sigma(\ms B) \). That is, we know \( \mu(B) \in [0, \infty] \) for each \( B \in \ms B \). Because of the additivity property, there is no question as to how we should extend \( \mu \) to \(\ms A\). We must have \[ \mu(A) = \sum_{i \in I} \mu(B_i)\] if \(A = \bigcup_{i \in I} B_i\) for some finite, disjoint collection \( \{B_i: i \in I\} \) of sets in \( \ms B \) (so that \( A \in \ms A \)). However, we cannot assign the values \( \mu(B) \) for \( B \in \ms B \) arbitrarily. The following extension theorem states that, subject just to some essential consistency conditions, the extension of \( \mu \) from the semi-algebra \( \ms B \) to the algebra \( \ms A \) does in fact produce a measure on \( \ms A \). The consistency conditions are that \( \mu \) be finitely additive and countably subadditive on \( \ms B \).

Suppose that \( \ms B \) is a semi-algebra of subsets of \( S \) and that \( \ms A \) is the algebra of subsets of \( S \) generated by \(\ms B\). A function \( \mu: \ms B \to [0, \infty] \) can be uniquely extended to a measure on \( \ms A \) if and only if \( \mu \) satisfies the following properties:

If \( \emptyset \in \ms B \) then \( \mu(\emptyset) = 0 \).
If \( \{B_i: i \in I\} \) is a finite, disjoint collection of sets in \( \ms B \) and \( B = \bigcup_{i \in I} B_i \in \ms B \) then \( \mu(B) = \sum_{i \in I} \mu(B_i) \).
If \( B \in \ms B \) and \( B \subseteq \bigcup_{i \in I} B_i \) where \( \{B_i: i \in I\} \) is a countable collection of sets in \( \ms B \) then \( \mu(B) \le \sum_{i \in I} \mu(B_i) \)

If the measure \( \mu \) on the algebra \( \ms A \) is \( \sigma \)-finite, then the extension theorem and the uniqueness theorem apply, so \( \mu \) can be extended uniquely to a measure on the \( \sigma \)-algebra \( \ms S = \sigma(\ms A) = \sigma(\ms B) \). This chain of extensions, starting with a semi-algebra \( \ms B \), is often how measures are constructed.

Examples and Applications

Product Spaces

Suppose that \( (S, \ms S) \) and \( (T, \ms T) \) are measurable spaces. For the Cartesian product set \( S \times T \), recall that the product \( \sigma \)-algebra is \[ \ms S \times \ms T = \sigma\{A \times B: A \in \ms S, B \in \ms T\} \] the \( \sigma \)-algebra generated by the Cartesian products of measurable sets, sometimes referred to as measurable rectangles. Although the notation is the same, please remember that \(\ms S \times \ms T\) is not the Cartesian product of \(\ms S\) and \(\ms T\).

Suppose that \( (S, \ms S, \mu) \) and \( (T, \ms T, \nu) \) are \( \sigma \)-finite measure spaces. Then there exists a unique \( \sigma \)-finite measure \( \mu \times \nu \) on \((S \times T, \ms S \times \ms T) \) such that \[ (\mu \times \nu)(A \times B) = \mu(A) \nu(B); \quad A \in \ms S, \; B \in \ms T \] The measure space \( (S \times T, \ms S \times \ms T, \mu \times \nu) \) is the product measure space associated with \( (S, \ms S, \mu) \) and \( (T, \ms T, \nu) \).

Details:

Recall that the collection \( \ms B = \{A \times B: A \in \ms S, B \in \ms T\} \) is a semi-algebra: the intersection of two product sets is another product set, and the complement of a product set is the union of two disjoint product sets. We define \( \rho: \ms B \to [0, \infty] \) by \( \rho(A \times B) = \mu(A) \nu(B) \). The consistency conditions hold, so \( \rho \) can be extended to a measure on the algebra \( \ms A \) generated by \( \ms B \). The algebra \( \ms A \) is the collection of all finite, disjoint unions of products of measurable sets. We will now show that the extended measure \( \rho \) is \( \sigma \)-finite on \( \ms A \). Since \( \mu \) is \( \sigma \)-finite, by , there exists an increasing sequence \( (A_1, A_2, \ldots) \) of sets in \( \ms S \) with \( \mu(A_i) \lt \infty \) and \( \bigcup_{i = 1}^\infty A_i = S \). Similarly, there exists an increasing sequence \( (B_1, B_2, \ldots) \) of sets in \( \ms T \) with \( \nu(B_j) \lt \infty \) and \( \bigcup_{j = 1}^\infty B_j = T \). Then \( \rho(A_i \times B_j) = \mu(A_i) \nu(B_j) \lt \infty \), and since the sets are increasing, \( \bigcup_{(i, j) \in \N_+ \times \N_+} A_i \times B_j = S \times T \). The extension theorem and uniqueness theorem now apply, so \( \rho \) can be extended uniquely to a measure on \( \sigma(\ms A) = \ms S \times \ms T \).

Recall that for \( C \subseteq S \times T \), the cross section of \( C \) in the first coordinate at \( x \in S \) is \( C_x = \{ y \in T: (x, y) \in C\} \). Similarly, the cross section of \( C \) in the second coordinate at \( y \in T \) is \( C^y = \{ x \in S: (x, y) \in C\} \). We know that the cross sections of a measurable set are measurable. The following result shows that the measures of the cross sections of a measurable set form measurable functions.

Suppose again that \( (S, \ms S, \mu) \) and \( (T, \ms T, \nu) \) are \( \sigma \)-finite measure spaces. If \( C \in \ms S \times \ms T \) then

\( x \mapsto \nu(C_x) \) is a measurable function from \( S \) to \( [0, \infty] \).
\( y \mapsto \mu(C^y) \) is a measurable function from \( T \) to \( [0, \infty] \).

Details:

We prove part (a), since of course the proof for part (b) is symmetric. Suppose first that the measure spaces are finite. Let \( \ms R = \{A \times B: A \in \ms S, B \in \ms T\} \) denote the set of measurable rectangles. Let \( \ms{C} = \{C \in \ms S \times \ms T: x \mapsto \nu(C_x) \text{ is measurable}\}\). If \( A \times B \in \ms R \), then \( A \times B \in \ms{C} \), since \( \nu[(A \times B)_x] = \nu(B) \bs{1}_A(x) \). Next, suppose \( C \in \ms{C} \). Then \( (C^c)_x = (C_x)^c \), so \( \nu[(C^c)_x] = \nu(T) - \nu(C_x) \) and this is a measurable function of \( x \in S \). Hence \( C^c \in \ms{C} \). Next, suppose that \( \{C_i: i \in I\} \) is a countable, disjoint collection of sets in \( \ms{C} \) and let \( C = \bigcup_{i \in I} C_i \). Then \( \{(C_i)_x: i \in I\} \) is a countable, disjoint collection of sets in \( \ms T \), and \( C_x = \bigcup_{i \in I} (C_i)_x \). Hence \( \nu(C_x) = \sum_{i \in I} \nu[(C_i)_x] \), and this is a measurable function of \( x \in S \). Hence \( C \in \ms{C} \). It follows that \( \ms{C} \) is a \( \lambda \)-system that contains \( \ms R \), which in turn is a \( \pi \)-system. Hence from Dynkins \(\pi\)-\(\lambda \) theorem, that \( \ms S \times \ms T = \sigma(\ms R) \subseteq \ms{C} \). Thus \( \ms{C} = \ms S \times \ms T \).

Consider now the general case where the measure spaces are \( \sigma \)-finite. There exists a countable, increasing sequence of sets \( C_n \in \ms S \times \ms T \) for \( n \in \N_+ \) with \( (\mu \times \nu)(C_n) \lt \infty \) for \( n \in \N_+ \). If \( C \in \ms S \times \ms T \), then \( C \cap C_n \) is increasing in \( n \in \N_+ \), and \( C = \bigcup_{n=1}^\infty (C \cap C_n) \). Hence, for \( x \in S \), \( (C \cap C_n)_x \) is increasing in \( n \in \N_+ \) and \( C_x = \bigcup_{n=1}^\infty (C \cap C_n)_x \). Therefore \( \nu(C_x) = \lim_{n \to \infty} \nu[(C \cap C_n)_x] \). But \( x \mapsto \nu[(C \cap C_n)_x] \) is a measurable function of \( x \in S \) for each \( n \in \N_+ \) by the previous argument, so \( x \mapsto \nu(C_x) \) is a measurable function of \( x \in S \).

In our study of integration with respect to a measure, we will see that for \( C \in \ms S \times \ms T \), the product measure \( (\mu \times \nu)(C) \) can be computed by integrating \( \nu(C_x) \) over \( x \in S \) with respect to \( \mu \) or by integrating \( \mu(C^y) \) over \( y \in T \) with respect to \( \nu \). These results, generalizing the definition of the product measure in , are special cases of Fubini's theorem, named for the Italian mathematician Guido Fubini.

Except for more complicated notation, these results extend in a perfectly straightforward way to the product of a finite number of \( \sigma \)-finite measure spaces.

Suppose that \( n \in \N_+ \) and that \( (S_i, \ms S_i, \mu_i) \) is a \( \sigma \)-finite measure space for \( i \in \{1, 2, \ldots, n\} \). Let \( S = \prod_{i=1}^n S_i \) and let \( \ms S = \prod_{i=1}^n \ms S_i \) denote the corresponding product \( \sigma \)-algebra. There exists a unique \( \sigma \)-finite measure \( \mu \) on \( (S, \ms S) \) satisfying \[ \mu\left(\prod_{i=1}^n A_i\right) = \prod_{i=1}^n \mu_i(A_i), \quad A_i \in \ms S_i \text{ for } i \in \{1, 2, \ldots, n\} \] The measure space \( (S, \ms S, \mu) \) is the product measure space associated with the given measure spaces.

Euclidean Measure Spaces

The next discussion concerns our most important and essential application. Recall that the Borel \( \sigma \)-algebra on \( \R \), named for Émile Borel, is the \( \sigma \)-algebra \( \ms R \) generated by the standard Euclidean topology on \( \R \). Equivalently, \( \ms R = \sigma(\ms I) \) where \( \ms I \) is the collection of intervals of \( \R \) (of all types—bounded and unbounded, with any type of closure, and including single points and the empty set). Next recall how the length of an interval is defined. For \( a, \, b \in \R \) with \( a \le b \), each of the intervals \( (a, b) \), \( [a, b) \), \( (a, b] \), and \( [a, b] \) has length \( b - a \). For \( a \in \R \), each of the intervals \( (a, \infty) \), \( [a, \infty) \), \( (-\infty, a) \), \( (-\infty, a] \) has length \( \infty \), as does \( \R \) itself. The standard measure on \( \ms R \) generalizes the length measurement for intervals.

There exists a unique measure \( \lambda \) on \( \ms R \) such that \( \lambda(I) = \length(I) \) for \( I \in \ms I \). The measure \( \lambda \) is Lebesgue measure on \( (\R, \ms R) \).

Details:

Recall that \( \ms I \) is a semi-algebra: The intersection of two intervals is another interval, and the complement of an interval is either another interval or the union of two disjoint intervals. Define \( \lambda \) on \( \ms I \) by \( \lambda(I) = \length(I) \) for \( I \in \ms I \). Then \( \lambda \) satisfies the consistency condition and hence \( \lambda \) can be extended to a measure on the algebra \( \ms{J} \) generated by \( \ms I \), namely the collection of finite, disjoint unions of intervals. The measure \( \lambda \) on \( \ms{J} \) is clearly \( \sigma \)-finite, since \( \R \) can be written as a countably infinite union of bounded intervals. Hence the extension theorem and the uniqueness theorem apply, so \( \lambda \) can be extended to a measure on \( \ms R = \sigma(\ms I) \).

The is name in honor of Henri Lebesgue, of course. Since \( \lambda \) is \( \sigma \)-finite, the \( \sigma \)-algebra of Borel sets \( \ms R \) can be completed with respect to \( \lambda \), and the completion is the Lebesgue \( \sigma \)-algebra. Recall that completed means that if \( A \in \ms R^* \), \( \lambda(A) = 0 \) and \( B \subseteq A \), then \( B \in \ms R^* \) (and then \( \lambda(B) = 0 \)). The Lebesgue measure \( \lambda \) on \( \R \), with either the Borel \( \sigma \)-algebra \( \ms R \), or its completion is the standard measure that is used for the real numbers. The Lebesgue \(\sigma\)-algebra is sometimes needed, instead of the Borel \(\sigma\)-algebra for technical reasons.

For \( n \in \N_+ \), let \( \ms R^n \) denote the Borel \( \sigma \)-algebra corresponding to the the standard Euclidean topology on \( \R^n \), so that \( (\R^n, \ms R^n) \) is the \( n \)-dimensional Euclidean measurable space. The \( \sigma \)-algebra, \( \ms R^n \) is also the \( n \)-fold power of \( \ms R \), the Borel \( \sigma \)-algebra of \( \R \). That is, \( \ms R^n = \ms R \times \ms R \times \cdots \times \ms R \) (\( n \) times). It is also the \( \sigma \)-algebra generated by the products of intervals: \[ \ms R^n = \sigma\left\{I_1 \times I_2 \times \cdots I_n: I_j \in \ms I \text{ for } j \in \{1, 2, \ldots n\}\right\} \] As above, let \( \lambda \) denote Lebesgue measure on \( (\R, \ms R) \).

For \( n \in \N_+ \) the \( n \)-fold power of \( \lambda \), denoted \( \lambda^n \) is Lebesgue measure on \( (\R^n, \ms R^n) \). In particular, \[ \lambda^n(A_1 \times A_2 \times \cdots \times A_n) = \lambda(A_1) \lambda(A_2) \cdots \lambda(A_n); \quad A_1, \, \ldots, A_n \in \ms R \] The measure space \((\R^n, \ms R^n, \lambda^n)\) is the \(n\)-dimensional Euclidean measure space.

Specializing further, if \( I_j \in \ms I \) is an interval for \( j \in \{1, 2, \ldots, n\} \) then \[ \lambda^n\left(I_1 \times I_2 \times \cdots \times I_n\right) = \length(I_1) \length(I_2) \cdots \length(I_n) \] Just as \(\lambda\) extends length to \(\ms R\), \( \lambda_2 \) extends area to \( \ms R_2 \) and \( \lambda_3 \) extends volume to \( \ms R_3 \). In general, \( \lambda^n(A) \) is sometimes referred to as \( n \)-dimensional volume of \( A \in \ms R^n \). As in the one-dimensional case, \( \ms R^n \) can be completed with respect to \( \lambda^n \), essentially adding all subsets of sets of measure 0 to \( \ms R^n \). The completed \( \sigma \)-algebra is the \( \sigma \)-algebra of Lebesgue measurable sets. Since \( \lambda^n(U) \gt 0 \) if \( U \subseteq \R^n \) is open, the support of \( \lambda^n \) is all of \( \R^n \). In addition, Lebesgue measure has the regularity properties that are concerned with approximating the measure of a set, from below with the measure of a compact set, and from above with the measure of an open set.

For \(n \in \N_+\), the \(n\)-dimensional Euclidean measure space \( (\R^n, \ms R^n, \lambda^n) \) is regular. That is, for \( A \in \ms R^n \),

\( \lambda^n(A) = \sup\{\lambda^n(C): C \text{ is compact and } C \subseteq A\} \), (inner regularity)
\( \lambda^n(A) = \inf\{\lambda^n(U): U \text { is open and } A \subseteq U\} \) (outer regulairty).

The following theorem describes how the measure of a set is changed under certain basic transformations. These are essential properties of Lebesgue measure. To setup the notation, suppose that \( n \in \N_+ \), \( A \subseteq \R^n \), \( x \in \R^n \), \( c \in (0, \infty) \) and that \( T \) is an \( n \times n \) matrix. Define \[ A + x = \{a + x: a \in A\}, \quad c A = \{c a: a \in A\}, \quad TA = \{T a: a \in A\} \]

Suppose that \(n \in \N_+\) and that \( A \in \ms R^n \).

If \( x \in \R^n \) then \( \lambda^n(A + x) = \lambda^n(A) \) (translation invariance)
If \( c \in (0, \infty) \) then \( \lambda^n(c A) = c^n \lambda^n(A) \) (dialation property)
If \( T \) is an \( n \times n \) matrix then \( \lambda^n(T A) = |\det(T)| \lambda^n(A) \) (the scaling property)

Distribution Functions and Their Measures

The construction of Lebesgue measure on \( \R \) can be generalized. Here is the definition that we will need.

A function \( F: \R \to \R \) that satisfis the following properties is a distribution function on \( \R \)

\( F \) is increasing: if \( x \le y \) then \( F(x) \le F(y) \).
\( F \) is continuous from the right: \( \lim_{t \downarrow x} F(t) = F(x) \) for all \( x \in \R \).

Since \( F \) is increasing, the limit from the left at \( x \in \R \) exists in \( \R \) and is denoted \( F(x^-) = \lim_{t \uparrow x} F(t) \). Similarly \(F(\infty) = \lim_{x \to \infty} F(x) \) exists, as a real number or \( \infty \), and \(F(-\infty) = \lim_{x \to -\infty} F(x) \) exists, as a real number or \( -\infty \).

If \( F \) is a distribution function on \( \R \), then there exists a unique \(\sigma\)-finite measure \( \mu \) on \( \ms R \) that satisfies \[ \mu(a, b] = F(b) - F(a), \quad -\infty \le a \le b \le \infty \]

Details:

Let \( \ms I \) denote the collection of subsets of \( \R \) consisting of intervals of the form \( (a, b] \) where \( a, \, b \in \R \) with \( a \le b \), and intervals of the form \((-\infty, a]\) and \( (a, \infty) \) where \( a \in \R \). Then \( \ms I \) is a semi-algebra. That is, if \( A, \, B \in \ms I \) then \( A \cap B \in \ms I \), and if \( A \in \ms I \) then \( A^c \) is the union of a finite number (actually one or two) sets in \( \ms I \). We define \( \mu \) on \(\ms I\) by \( \mu(a, b] = F(b) - F(a) \), \( \mu(-\infty, a] = F(a) - F(-\infty) \) and \( \mu(a, \infty) = F(\infty) - F(a) \). Note that \( \ms I \) contains the empty set via intervals of the form \( (a, a] \) where \( a \in \R \), but the definition gives \( \mu(\emptyset) = 0 \). Next, \( \mu \) is finitely additive on \( \ms I \). That is, if \( \{A_i: i \in I\} \) is a finite, disjoint collection of sets in \( \ms I \) and \( \bigcup_{i \in I} A_i \in \ms I \), then \[ \mu\left(\bigcup_{i \in I} A_i\right) = \sum_{i \in I} \mu(A_i) \] Next, \( \mu \) is countably subadditive on \( \ms I \). That is, if \( A \in \ms I \) and \( A \subseteq \bigcup_{i \in I} A_i \) where \( \{A_i: i \in I\} \) is a countable collection of sets in \( \ms I \) then \[ \mu(A) \le \sum_{i \in I} \mu(A_i) \] Finally, \( \mu \) is clearly \( \sigma \)-finite on \( \ms I \) since \( \mu(a, b] \lt \infty \) for \( a, \, b \in \R \) with \( a \lt b \), and \( \R \) is a countable, disjoint union of intervals of this form. Hence it follows from the extension theorem and the uniqueness theorem that \( \mu \) can be extended uniquely to a measure on the \( \ms R = \sigma(\ms I) \).

For the final uniqueness part, suppose that \( \mu \) is a measure on \( \ms R \) satisfying \( \mu(a, b] = F(b) - F(a) \) for \( a, \, b \in \R \) with \( a \lt b \). Then by the continuity theorem for increasing sets, \( \mu(-\infty, a] = F(a) - F(-\infty) \) and \( \mu(a, \infty) = F(\infty) - F(a) \) for \( a \in \R \). Hence \( \mu \) is the unique measure constructed above.

The measure \( \mu \) is called the Lebesgue-Stieltjes measure associated with \( F \), named for Henri Lebesgue and Thomas Joannes Stieltjes. A very rich variety of measures on \( \R \) can be constructed in this way. In particular, when the function \( F \) takes values in \( [0, 1] \), the associated measure \( \P \) is a probability measure. Another special case of interest is the distribution function defined by \( F(x) = x \) for \( x \in \R \), in which case \( \mu(a, b] \) is the length of the interval \( (a, b] \) and therefore \( \mu = \lambda \), Lebesgue measure on \( \ms R \). But although the measure associated with a distribution function is unique, the distribution function itself is not. Note that if \( c \in \R \) then the distribution function defined by \( F(x) = x + c\) for \( x \in \R \) also generates Lebesgue measure. This example captures the general situation.

Suppose that \( F \) and \( G \) are distribution functions that generate the same measure \( \mu \) on \( \R \). Then there exists \( c \in \R \) such that \( G = F + c \).

Details:

For \( x \in \R \), note that \( F(x) - F(0) = G(x) - G(0) \). The common value is \( \mu(0, x] \) if \( x \ge 0 \) and \( -\mu(x, 0] \) if \( x \lt 0 \). Thus \( G(x) = F(x) - F(0) + G(0) \) for \( x \in \R \).

Having constructed a measure from a distribution function, let's now consider the complementary problem of finding a distribution function for a given measure. The proof of points the way.

Suppose that \( \mu \) is a positive measure on \( (\R, \ms R) \) with the property that \( \mu(A) \lt \infty \) if \( A \) is bounded. Then there exists a distribution function that generates \( \mu \).

Details:

Define \( F \) on \( \R \) by \[ F(x) = \begin{cases} \mu(0, x], & x \ge 0 \\ -\mu(x, 0], & x \lt 0 \end{cases} \] Then \( F: \R \to \R \) by the assumption on \( \mu \). Also \( F \) is increasing: if \( 0 \le x \le y \) then \( \mu(0, x] \le \mu(0, y] \) by the increasing property of a positive measure. Similarly, if \( x \le y \le 0 \), the \( \mu(x, 0] \ge \mu(y, 0] \), so \( -\mu(x, 0] \le -\mu(y, 0] \). Finally, if \( x \le 0 \le y \), then \(-\mu(x, 0] \le 0\) and \( \mu(0, y] \ge 0 \). Next, \( F \) is continuous from the right: Suppose that \( x_n \in \R \) for \( n \in \N_+ \) and \( x_n \downarrow x \) as \( n \to \infty \). If \( x \ge 0 \) then \( \mu(0, x_n] \downarrow \mu(0, x] \) by the continuity theorem for decreasing sets, which applies since the measures are finite. If \( x \lt 0 \) then \( \mu(x_n, 0] \uparrow \mu(x, 0] \) by the continuity theorem for increasing sets. So in both cases, \( F(x_n) \downarrow F(x) \) as \( n \to \infty \). Hence \( F \) is a distribution function, and it remains to show that it generates \( \mu \). Let \( a, \, b \in \R \) with \( a \le b \). If \( a \ge 0 \) then \( \mu(a, b] = \mu(0, b] - \mu(0, a] = F(b) - F(a) \) by the difference property of a positive measure. Similarly, if \( b \le 0 \) then \( \mu(a, b] = \mu(a, 0] - \mu(b, 0] = -F(a) + F(b) \). Finally, if \( a \le 0 \) and \( b \ge 0 \), then \( \mu(a, b] = \mu(a, 0] + \mu(0, b] = -F(a) + F(b) \).

In the proof of , the use of 0 as a reference point is arbitrary, of course. Any other point in \( \R \) would do as well, and would produce a distribution function that differs from the one in the proof by a constant. If \( \mu \) has the property that \( \mu(-\infty, x] \lt \infty \) for \( x \in \R \), then it's easy to see that \( F \) defined by \( F(x) = \mu(-\infty, x] \) for \( x \in \R \) is a distribution function that generates \( \mu \), and is the unique distribution function with \( F(-\infty) = 0 \). In the case of a probability measure, this is the cumulative distribution function. The measure of any interval can be easily computed from the distribution function.

Suppose that \( F \) is a distribution function and \( \mu \) is the positive measure on \( (\R, \ms R) \) associated with \( F \). For \( a, \, b \in \R \) with \( a \lt b \),

\( \mu[a, b] = F(b) - F(a^-) \)
\( \mu\{a\} = F(a) - F(a^-) \)
\( \mu(a, b) = F(b^-) - F(a) \)
\( \mu[a, b) = F(b^-) - F(a^-) \)

Details:

All of these results follow from the continuity theorems for a positive measure. Suppose that \( (x_1, x_2, \ldots) \) is a sequence of distinct points in \( \R \).

If \( x_n \uparrow a \) as \( n \to \infty \) then \( (x_n, b] \uparrow [a, b] \) so \( \mu(x_n, b] \uparrow \mu[a, b] \) as \( n \to \infty \). But also \( \mu(x_n, b] = F(b) - F(x_n) \to F(b) - F(a^-) \) as \( n \to \infty \).
This follows from (a) by taking \( a = b \)
If \( x_n \uparrow b \) as \( n \to \infty \) then \( (a, x_n] \uparrow (a, b) \) so \( \mu(a, x_n] \uparrow \mu(a, b) \) as \( n \to \infty \). But also \( \mu(a, x_n] = F(x_n) - F(a) \to F(b^-) - F(a) \) as \( n \to \infty \).
From (a) and (b) and the difference rule, \[ \mu[a, b) = \mu[a, b] - \mu\{b\} = F(b) - F(a^-) - \left[F(b) - F(b^-)\right] = F(b^-) - F(a^-) \]

Note that \( F \) is continuous at \( x \in \R \) if and only if \( \mu\{x\} = 0 \). In particular, \( \mu \) is a continuous measure (recall that this means that \( \mu\{x\} = 0 \) for all \( x \in \R \)) if and only if \( F \) is continuous on \( \R \). On the other hand, \( F \) is discontinuous at \( x \in \R \) if and only if \( \mu\{x\} \gt 0 \), so that \( \mu \) has an atom at \( x \). So \( \mu \) is a discrete measure (recall that this means that \( \mu \) has countable support) if and only if \( F \) is a step function.

Suppose again that \( F \) is a distribution function and \( \mu \) is the positive measure on \( (\R, \ms R) \) associated with \( F \). If \( a \in \R \) then

\( \mu(a, \infty) = F(\infty) - F(a) \)
\( \mu[a, \infty) = F(\infty) - F(a^-) \)
\( \mu(-\infty, a] = F(a) - F(-\infty) \)
\( \mu(-\infty, a) = F(a^-) - F(-\infty) \)
\( \mu(\R) = F(\infty) - F(-\infty) \)

Details:

The proofs, as before, just use the continuity theorems. Suppose that \( (x_1, x_2, \ldots) \) is a sequence of distinct points in \( \R \)

If \( x_n \uparrow \infty \) as \( n \to \infty \) then \( (a, x_n] \uparrow (a, \infty) \) so \( \mu(a, x_n] \uparrow \mu(a, \infty) \) as \( n \to \infty \). But also \( \mu(a, x_n] = F(x_n) - F(a) \to F(\infty) - F(a) \) as \( n \to \infty \)
Similarly, if \( x_n \uparrow \infty \) as \( n \to \infty \) then \( [a, x_n] \uparrow (a, \infty) \) so \( \mu[a, x_n] \uparrow \mu[a, \infty) \) as \( n \to \infty \). But also \( \mu[a, x_n] = F(x_n) - F(a^-) \to F(\infty) - F(a^-) \) as \( n \to \infty \)
If \( x_n \downarrow -\infty \) as \( n \to \infty \) then \( (x_n, a] \uparrow (-\infty, a] \) so \( \mu(x_n, a] \uparrow \mu(-\infty, a] \) as \( n \to \infty \). But also \( \mu(x_n, a] = F(a) - F(x_n) \to F(a) - F(-\infty) \) as \( n \to \infty \)
Similarly, if \( x_n \downarrow -\infty \) as \( n \to \infty \) then \( (x_n, a) \uparrow (-\infty, a) \) so \( \mu(x_n, a) \uparrow \mu(-\infty, a) \) as \( n \to \infty \). But also \( \mu(x_n, a) = F(a^-) - F(x_n) \to F(a^-) - F(-\infty) \) as \( n \to \infty \)
\( \mu(\R) = \mu(-\infty, 0] + \mu(0, \infty) = \left[F(0) - F(-\infty)\right] + \left[F(\infty) - F(0)\right] = F(\infty) - F(-\infty) \).