\( \newcommand{\P}{\mathbb{P}} \) \( \newcommand{\R}{\mathbb{R}} \) \( \newcommand{\N}{\mathbb{N}} \) \( \newcommand{\Z}{\mathbb{Z}} \) \( \newcommand{\bs}{\boldsymbol} \)
  1. Random
  2. 1. Probability Spaces
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7
  10. 8
  11. 9
  12. 10

7. Measure Spaces

In this section we discuss probability spaces, and general measure spaces, from a more advanced point of view. The sections on Measure Theory and Special Set Structures in the chapter on Foundations are essential prerequisites. On the other hand, if you are not interested in the measure-theoretic aspects of probability, you can safely skip this section.

Positive Measure

Definition

Suppose that \( S \) is a set, playing the role of a universal set for a mathematical theory. As we noted before, \( S \) usually comes with a \( \sigma \)-algebra \( \mathscr{S} \) of admissible subsets of \( S \), so that \( (S, \mathscr{S}) \) is a measurable space. In particular, this is the case for the model of a random experiment, where \( S \) is the sample space: the collection of events \( \mathscr{S} \) is required to be a \( \sigma \)-algebra. A probability measure is a special case of a more general object known as a positive measure.

A positive measure on \((S, \mathscr{S})\) is a function \(\mu: \mathscr{S} \to [0, \infty] \) that satisfies the following axioms:

  1. \( \mu(\emptyset) = 0 \)
  2. If \(\{A_i: i \in I\}\) is a countable, pairwise disjoint collection of sets in \(\mathscr{S}\) then \[\mu\left(\bigcup_{i \in I} A_i\right) = \sum_{i \in I} \mu(A_i)\]

Axiom (b) is called countable additivity, and is the essential property. The measure of a set that consists of a countable union of disjoint pieces is the sum of the measures of the pieces. The three objects together, \( (S, \mathscr{S}, \mu) \) form a measure space.

A union of four disjoint sets
Union1.png

In particular, a probability measure \(\P\) on \((S, \mathscr{S})\) is a positive measure on \((S, \mathscr{S})\) with the additional requirement that \(\P(S) = 1\). However, positive measures are important beyond the application to probability. The standard measures on the Euclidean spaces are all positive measures: the extension of length for measurable subsets of \( \R \), the extension of area for measurable subsets of \( \R^2 \), the extension of volume for measurable subsets of \( \R^3 \), and the higher dimensional analogues. We will actually construct these measures in the next section on Existence and Uniqueness. In addition, Counting measure \( \# \) is a positive measure on the subsets of a set \( S \). Even more general measures that can take positive and negative values are explored in the chapter on Distributions.

Constructions

There are several simple ways to construct new positive measures from existing ones. As usual, we start with a measurable space \( (S, \mathscr{S}) \)

Suppose that \( (R, \mathscr{R}) \) is a measurable subspace of \( (S, \mathscr{S}) \), in the sense that \( \mathscr{R} \) is a \( \sigma \)-algebra of subsets of \( R \) and \( \mathscr{R} \subseteq \mathscr{S} \) (and hence in particular \( R \in \mathscr{S} \)). If \( \mu \) is a positive measure on \( (S, \mathscr{S}) \) then \( \mu \) restricted to \( \mathscr{R} \) is a positive measure on \( (R, \mathscr{R}) \).

Proof:

Since the additivity property of \( \mu \) holds for a countable, disjoint collection of events in \( \mathscr{S} \), it trivially holds for a countable, disjoint collection of events in \( \mathscr{R} \).

In particular, the previous theorem would apply when \( R = S \) so that \( \mathscr{R} \) is a sub \( \sigma \)-algebra of \( \mathscr{S} \).

If \( \mu \) is a positive measure on \( (S, \mathscr{S}) \) and \( c \gt 0 \), then \( c \mu \) is also a positive measure on \( (S, \mathscr{S}) \).

Proof:

Clearly \( c \mu: \mathscr{S} \to [0, \infty] \).

  1. \( (c \mu)(\emptyset) = c \mu(\emptyset) = 0 \)
  2. If \( \{A_i: i \in I\} \) is a countable, disjoint collection of events in \( \mathscr{S} \) then \[ (c \mu)\left(\bigcup_{i \in I} A_i\right) = c \mu\left(\bigcup_{i \in I} A_i\right) = c \sum_{i \in I} \mu(A_i) = \sum_{i \in I} c \mu(A_i) \]

If \( \mu_i \) is a positive measure on \( (S, \mathscr{S}) \) for each \( i \) in a countable index set \( I \) then \( \sum_{i \in I} \mu_i \) is also a positive measure on \( (S, \mathscr{S}) \).

Proof:

Let \( \mu = \sum_{i \in I} \mu_i \). Clearly \( \mu: \mathscr{S} \to [0, \infty] \).

  1. \( \mu(\emptyset) = \sum_{i \in I} \mu_i(\emptyset) = 0 \)
  2. If \( \{A_j: j \in J\} \) is a countable, disjoint collection of events in \( \mathscr{S} \) then \[ \mu\left(\bigcup_{j \in J} A_j\right) = \sum_{i \in I} \mu_i \left(\bigcup_{j \in J} A_j\right) = \sum_{i \in I} \sum_{j \in J} \mu_i(A_j) = \sum_{j \in J} \sum_{i \in I} \mu_i(A_j) = \sum_{j \in J} \mu(A_j) \] The interchange of sums is permissible since the terms are nonnegative.

Combining the last two theorems, note that a positive linear combination of positive measures is a positive measure. The next method is sometimes referred to as a change of variables.

Suppose that \( (T, \mathscr{T}) \) is another measurable space and that \( f: S \to T \) is measurable. Then \( \nu \) defined by \[ \nu(B) = \mu\left[f^{-1}(B)\right], \quad B \in \mathscr{T} \] is a positive measure on \( (T, \mathscr{T}) \).

Proof:

Recall that inverse images preserve all set operations. In particular, \( f^{-1}(\emptyset) = \emptyset \) so \( \nu(\emptyset) = 0 \). If \( \left\{B_i: i \in I\right\} \) is a countable, disjoint collection of sets in \( \mathscr{T} \), then \( \left\{f^{-1}(B_i): i \in I\right\} \) is a countable, disjoint collection of sets in \( \mathscr{S} \), and \( f^{-1}\left(\bigcup_{i \in I} B_i\right) = \bigcup_{i \in I} f^{-1}(B_i) \). hence \[ \nu\left(\bigcup_{i \in I} B_i\right) = \mu\left[f^{-1}\left(\bigcup_{i \in I} B_i\right)\right] = \mu\left[\bigcup_{i \in I} f^{-1}(B_i)\right] = \sum_{i \in I} \mu\left[f^{-1}(B_i)\right] = \sum_{i \in I} \nu(B_i) \]

Often the spaces that occur in probability and stochastic processes are topological spaces. Recall that a topological space \( (S, \mathscr{S}) \) consists of a set \( S \) and a topology \( \mathscr{S} \) on \( S \) (the collection of open sets). Recall also that \( \sigma(\mathscr{S}) \) is the Borel \( \sigma \)-algebra on \( S \), so every topological space naturally leads to a measurable space.

Suppose that \( (S, \mathscr{S}) \) is a topological space. A positive measure \( \mu \) on \( (S, \sigma(\mathscr{S})) \) is a Borel measure if \( \mu(C) \lt \infty \) for every compact \( C \subseteq S \).

It's easy to explicitly construct a positive measure on a \( \sigma \)-algebra generated by a countable partition. Such \( \sigma \)-algebras are important for counterexamples and to gain insight, and also because many \( \sigma \)-algebras that occur in applications can be constructed from them.

Suppose that \( \mathscr{A} = \{A_i: i \in I\} \) is a countable partition of \( S \) into nonempty sets, and that \( \mathscr{S} = \sigma(\mathscr{A}) \). For \( i \in I \), define \( \mu(A_i) \in [0, \infty) \) arbitrarily. For \( A = \bigcup_{j \in J} A_j \) where \( J \subseteq I \), define \[ \mu(A) = \sum_{j \in J} \mu(A_j) \] Then \( \mu \) is a positive measure on \( (S, \mathscr{S}) \).

Proof:

Recall that every \( A \in \mathscr{S} \) has a unique representation of the form \( A = \bigcup_{j \in J} A_j \) where \( J \subseteq I \).

  1. \( J = \emptyset \) in the representation gives \( A = \emptyset \). The sum over an empty index set is 0, so \( \mu(\emptyset) = 0 \).
  2. Suppose that \( \{B_k: k \in K\} \) is a countable, disjoint collection of events in \( \mathscr{S} \). Then for each \( k \in K \) there exists \( J_k \subseteq I \) and \( \{A^k_j: j \in J_k\} \subseteq \mathscr{A} \) such that \( B_k = \bigcup_{j \in J_k} A^k_j \). Hence \[ \mu\left(\bigcup_{k \in K} B_k\right) = \mu\left(\bigcup_{k \in K} \bigcup_{j \in J_k} A^k_j\right) = \sum_{k \in k}\sum_{j \in J_k} \mu(A^k_j) = \sum_{k \in K} \mu(B_k) \] The fact that the terms are all nonnegative means that we do not have to worry about the order of summation.

One of the most general ways to construct new measures from old ones is via the theory of integration with respect to a positive measure, which is explored in the chapter on Distributions. The construction of positive measures more or less from scratch is considered in the next section on Existence and Uniqueness.

Properties

The following results give some simple properties of a positive measure \( \mu \) on \( (S, \mathscr{S}) \). The proofs are essentially identical to the proofs of the corresponding properties of probability, except that the measure of a set may be infinite so we must be careful not to use the meaningless indeterminate form \( \infty - \infty \).

If \( A, \; B \in \mathscr{S} \), then \( \mu(B) = \mu(A \cap B) + \mu(B \setminus A) \).

Proof:

Note that \( B = (A \cap B) \cup (B \setminus A) \), and the sets in the union are disjoint.

If \( A, \; B \in \mathscr{S} \) and \( A \subseteq B \) then

  1. \( \mu(B) = \mu(A) + \mu(B \setminus A) \)
  2. \( \mu(A) \le \mu(B) \)
Proof:

Part (a) follows from the previous theorem, since \( A \cap B = A \). Part (b) follows from part (a).

Thus \( \mu \) is increasing, relative to the subset partial order \( \subseteq \) on \( \mathscr{S} \) and the ordinary order \( \le \) on \( [0, \infty] \). Note also that if \( A, \; B \in \mathscr{S} \) and \( \mu(B) \lt \infty \) then \( \mu(B \setminus A) = \mu(B) - \mu(A \cap B) \). In the special case that \( A \subseteq B \), this becomes \( \mu(B \setminus A) = \mu(B) - \mu(A) \). These properties are just like the difference rules for probability. If \( \mu(S) \lt \infty \) then \( \mu(A^c) = \mu(S) - \mu(A) \). This is the analogue of the complement rule in probability, with but with \( \mu(S) \) replacing 1.

The following result is the analogue of Boole's inequality for probability. For a general positive measure \( \mu \), the result is referred to as the subadditive property.

Suppose that \( A_i \in \mathscr{S} \) for \( i \) in a countable index set \( I \). Then \[ \mu\left(\bigcup_{i \in I} A_i \right) \le \sum_{i \in I} \mu(A_i) \]

Proof:

The proof is exaclty like the one for Boole's inequality. Assume that \( I = \N_+ \). Let \( B_1 = A_1 \) and \( B_i = A_i \setminus (A_1 \cup \ldots \cup A_{i-1}) \) for \( i \in \{2, 3, \ldots\} \). Then \( \{B_i: i \in I\} \) is a disjoint collection of sets in \( \mathscr{S} \) with the same union as \( \{A_i: i \in I\} \). Also \( B_i \subseteq A_i \) for each \( i \) so \( \mu(B_i) \le \mu(A_i) \). Hence \[ \mu\left(\bigcup_{i \in I} A_i \right) = \mu\left(\bigcup_{i \in I} B_i \right) = \sum_{i \in I} \mu(B_i) \le \sum_{i \in I} \mu(A_i) \]

For a union of sets with finite measure, the inclusion-exclusion formula holds, and the proof is just like the one for probability.

Suppose that \(A_i \in \mathscr{S}\) for each \(i \in I\) where \(\#(I) = n\), and that \( \mu(A_i) \lt \infty \) for \( i \in I \). Then \[\mu \left( \bigcup_{i \in I} A_i \right) = \sum_{k = 1}^n (-1)^{k - 1} \sum_{J \subseteq I, \; \#(J) = k} \mu \left( \bigcap_{j \in J} A_j \right)\]

The continuity theorem for increasing sets holds for a positive measure. The continuity theorem for decreasing events holds also, if the sets have finite measure. Again, the proofs are similar to the ones for a probability measure, except for considerations of infinite measure.

Suppose that \( (A_1, A_2, \ldots) \) is a sequence of sets in \( \mathscr{S} \).

  1. If the sequence is increasing then \( \mu\left(\bigcup_{i=1}^\infty A_i \right) = \lim_{n \to \infty} \mu(A_n) \).
  2. If sequence is decreasing and \( \mu(A_1) \lt \infty \) then \( \mu\left(\bigcap_{i=1}^\infty A_i \right) = \lim_{n \to \infty} \mu(A_n) \).
Proof:
  1. Note that if \( \mu(A_k) = \infty \) for some \( k \) then \( \mu(A_n) = \infty \) for \( n \ge k \) and \( \mu\left(\bigcup_{i=1}^\infty A_i \right) = \infty \). Thus, suppose that \( \mu(A_i) \lt \infty \) for each \( i \). Let \( B_1 = A_1 \) and \( B_i = A_i \setminus A_{i-1} \) for \( i \in \{2, 3, \ldots\} \). Then \( (B_1, B_2, \ldots) \) is a disjoint sequence with the same union as \( (A_1, A_2, \ldots) \). Also, \( \mu(B_1) = \mu(A_1) \) and \( \mu(B_i) = \mu(A_i) - \mu(A_{i-1}) \) for \( i \in \{2, 3, \ldots\} \). Hence \[ \mu\left(\bigcup_{i=1}^\infty A_i \right) = \mu \left(\bigcup_{i=1}^\infty B_i \right) = \sum_{i=1}^\infty \mu(B_i) = \lim_{n \to \infty} \sum_{i=1}^n \mu(B_i) \] But \( \sum_{i=1}^n \mu(B_i) = \mu(A_1) + \sum_{i=2}^n [\mu(A_i) - \mu(A_{i-1})] = \mu(A_n) \).
  2. Note that \( A_1 \setminus A_n \) is increasing in \( n \). Hence using the continuity result for increasing sets, \begin{align} \mu \left(\bigcap_{i=1}^\infty A_i \right) & = \mu\left[A_1 \setminus \bigcup_{i=1}^\infty (A_1 \setminus A_i) \right] = \mu(A_1) - \mu\left[\bigcup_{i=1}^\infty (A_1 \setminus A_n)\right]\\ & = \mu(A_1) - \lim_{n \to \infty} \mu(A_1 \setminus A_n) = \mu(A_1) - \lim_{n \to \infty} \left[\mu(A_1) - \mu(A_n)\right] = \lim_{n \to \infty} \mu(A_n) \end{align}

The continuity theorem for decreasing events fails without the additional assumption of finite measure. For example, consider \( \Z \) with counting measure \( \# \). Let \( A_n = \{ z \in \Z: z \le -n\} \) for \( n \in \N_+ \). Then \( \#(A_n) = \infty \) for each \( n \) but \( \# \left(\bigcap_{i=1}^\infty A_i\right) = \#(\emptyset) = 0 \).

A nontrivial finite positive measure \( \mu \), with \( 0 \lt \mu(S) \lt \infty \), is practically just like a probability measure, and in fact can be re-scaled into a probability measure \( \P \), as was done in the section on Probability Measures:

\[ \P(A) = \frac{\mu(A)}{\mu(S)}, \quad A \in \mathscr{S} \]

If a positive measure \( \mu \) is not finite, then the following definition gives the next best thing.

The measure space \( (S, \mathscr{S}, \mu) \) is \( \sigma \)-finite if there exists a sequence \( (A_1, A_2, \ldots) \) of sets in \( \mathscr{S} \) with \( \bigcup_{i=1}^\infty A_i = S \) and \( \mu(A_i) \lt \infty \) for each \( i \in \N_+ \).

Thus, restricted to \( A_i \), \( \mu \) is finite measure, and hence certain nice properties of finite measures can be extended to \( \sigma \)-finite measures.

If \( (S, \mathscr{S}, \mu) \) is a \( \sigma \)-finite measure space.

  1. There exists an increasing sequence satisfying the \( \sigma \)-finite definition
  2. There exists a disjoint sequence satisfying the \( \sigma \)-finite definition.
Proof:

We use the same tricks that we have used before. Let \( A_n \in \mathscr{S}, \; n \in \N_+ \) be a sequence that satisfies the \( \sigma \)-finite definition. That is, \( \mu(A_n) \lt \infty \) for each \( n \in \N_+ \) and \( S = \bigcup_{n=1}^\infty A_n \).

  1. Let \( B_n = \bigcup_{i = 1}^n A_i \). Then \( B_n \in \mathscr{S} \) for \( n \in \N_+ \) and this sequence is increasing. Moreover, \( \mu(B_n) \le \sum_{i=1}^n \mu(A_i) \lt \infty \) for \( n \in \N_+ \) and \( \bigcup_{n=1}^\infty B_n = \bigcup_{n=1}^\infty A_n = S \).
  2. Let \( C_1 = A_1 \) and let \( C_n = A_n \setminus \bigcup_{i=1}^{n-1} A_i \) for \( n \in \{2, 3, \ldots\} \). Then \( C_n \in \mathscr{S} \) for each \( n \in \N_+ \) and this sequence is disjoint. Moreover, \( C_n \subseteq A_n \) so \( \mu(C_n) \le \mu(A_n) \lt \infty \) and \( \bigcup_{n=1}^\infty C_n = \bigcup_{n=1}^\infty A_n = S \).

Topics in Probability Revisited

Definitions

We can now give a precise definition of the probability space, the mathematical model of a random experiment.

A probability space \((S, \mathscr{S}, \P)\), consists of three essential parts:

  1. A set of outcomes \(S\).
  2. A \(\sigma\)-algebra of events \(\mathscr{S}\).
  3. A probability measure \(\P\) on the sample space \( (S, \mathscr{S}) \).

Often the special notation \( (\Omega, \mathscr{F}, \P) \) is used for a probability space in the literature. The symbol \( \Omega \) for the set of outcomes is intended to remind us that these are all possible outcomes. In probability, \(\sigma\)-algebras are not just important for theoretical and foundational purposes, but are important for practical purposes as well. A \(\sigma\)-algebra can be used to specify partial information about an experiment—a concept of fundamental importance. Specifically, suppose that \(\mathscr{A}\) is a collection of events in the experiment, and that we know whether or not \(A\) occurred for each \(A \in \mathscr{A}\). Then in fact, we can determine whether or not \(A\) occurred for each \(A \in \sigma(\mathscr{A})\), the \(\sigma\)-algebra generated by \(\mathscr{A}\).

Technically, a random variable for our experiment is a measurable function from the sample space into another measurable space.

Suppose that \( (S, \mathscr{S}, \P) \) is a probability space and that \( (T, \mathscr{T}) \) is another measurable space. A random variable \( X \) with values in \( T \) is a measurable function from \( S \) into \( T \)

Measurability ensures that \(\{X \in B\}\) (the inverse image of \( B \) under \( X \)) is a valid event (that is, a member of the \(\sigma\)-algebra \(\mathscr{S}\)) for each \(B \in \mathscr{T}\). By the change of variables theorem the mapping \(B \mapsto \P(X \in B)\), is a positive measure on \((T, \mathscr{T})\), and in fact is a probability measure, since \( \P(X \in T) = \P(S) = 1 \). The probability measure \( B \mapsto \P(X \in B) \) for \( B \in \mathscr{T} \) is the probability distribution of \( X \)

The event \( \{X \in B\} \) associated with \( B \in \mathscr{T} \)
InverseImage.png

Also, \(\{\{X \in B\}: B \in \mathscr{T}\}\) is a sub \(\sigma\)-algebra of \(\mathscr{S}\), and in fact is the \(\sigma\)-algebra generated by \(X\), denoted \(\sigma(X)\). If we observe the value of \(X\), then we know whether or not each event in \(\sigma(X)\) has occurred. More generally, suppose that \( (T_i, \mathscr{T}_i) \) is a measurable space for each \( i \) in an index set \( I \), and that \(X_i\) is a random variable taking values in a set \( T_i \) for each \( i \in I \). Recall that the \( \sigma \)-algebra generated by \( \{X_i: i \in I\} \) is \[ \sigma\{X_i: i \in I\} = \sigma\left\{\{X \in B_i\}: B_i \in \mathscr{T}_i, \; i \in I\right\} \] If we observe the value of \(X_i\) for each \(i \in I\) then we know whether or not each event in \(\sigma\{X_i: i \in I\}\) has occurred. This idea is very important in the study of stochastic processes.

Null and Almost Sure Events

Suppose that \( (S, \mathscr{S}, \P) \) is a probability space.

Define the following collections of events:

  1. \(\mathscr{N} = \{A \in \mathscr{S}: \P(A) = 0\} \), the collection of null events
  2. \(\mathscr{M} = \{A \in \mathscr{S}: \P(A) = 1\}\), the collection of almost sure events
  3. \( \mathscr{D} = \mathscr{N} \cup \mathscr{M} = \{A \in \mathscr{S}: \P(A) = 0 \text{ or } \P(A) = 1 \} \), the collection of essentially deterministic events

In the section on independence, we showed that \( \mathscr{D} \) is independent. It satisfies another important property as well:

\( \mathscr{D} \) is a sub \(\sigma\)-algebra of \( \mathscr{S} \).

Proof:

Trivially \( S \in \mathscr{D} \), and if \( A \in \mathscr{D} \) then \( A^c \in \mathscr{D} \). Suppose that \( A_i \in \mathscr{D} \) for \( i \in I \) where \( I \) is a countable index set. If \( \P(A_i) = 0 \) for every \( i \in I \) then \( \P\left(\bigcup_{i \in I} A_i \right) = 0 \) by Boole's inequality. On the other hand, if \( \P(A_j) = 1 \) for some \( j \in J \) then \( \P\left(\bigcup_{i \in I} A_i \right) = 1 \). In either case, \( \bigcup_{i \in I} A_i \in \mathscr{D} \).

Equivalent Events and Variables

Intuitively, equivalent events or random variables are those that are indistinguishable from a probabilistic point of view. To make this precise, recall first that the symmetric difference between events \( A \) and \( B \) is \( A \bigtriangleup B = (A \setminus B) \cup (B \setminus A) \); it is the event that occurs if and only if one of the events occurs, but not the other.

Events \(A\) and \(B\) are said to be equivalent if \( A \bigtriangleup B \in \mathscr{N} \), and we denote this by \( A \equiv B \).

Thus \(A \equiv B\) if and only if \(\P(A \bigtriangleup B) = \P(A \setminus B) + \P(B \setminus A) = 0\) if and only if \(\P(A \setminus B) = \P(B \setminus A) = 0\). As the name suggests, the relation \( \equiv \) really is an equivalence relation on \( \mathscr{S} \) and hence \( \mathscr{S} \) is partitioned into disjoint classes of mutually equivalent events.

The relation \( \equiv \) is an equivalence relation on \( \mathscr{S} \). That is, for \( A, \; B, \; C \in \mathscr{S} \),

  1. \(A \equiv A\) (the reflexive property).
  2. If \(A \equiv B\) then \(B \equiv A\) (the symmetric property).
  3. If \(A \equiv B\) and \(B \equiv C\) then \(A \equiv C\) (the transitive property).
Proof:

The reflexive and symmetric properties are trivial. For the transitive property, suppose that \( A \equiv B \) and \( B \equiv C \). Note that \( A \setminus C \subseteq (A \setminus B) \cup (B \setminus C) \), and hence \( \P(A \setminus C) = 0 \). By a symmetric argument, \( \P(C \setminus A) = 0 \).

If \( A \equiv B \) then \( A^c \equiv B^c \).

Proof:

Note that \( A^c \setminus B^c = B \setminus A \) and \( B^c \setminus A^c = A \setminus B \), so \( A^c \bigtriangleup B^c = A \bigtriangleup B \).

Equivalent events have the same probability.

If \(A \equiv B\) then \(\P(A) = \P(B)\).

Proof:

Note again that \( A = (A \cap B) \cup (A \setminus B) \). If \( A \equiv B \) then \( \P(A) = \P(A \cap B) \). By a symmetric argument, \( \P(B) = \P(A \cap B) \).

The converse trivially fails, as an example below shows. However, the null and almost sure events do form equivalence classes.

Suppose that \( A \in \mathscr{S} \).

  1. If \(A \in \mathscr{N}\) then \(A \equiv B\) if and only if \(B \in \mathscr{N}\).
  2. If \(A \in \mathscr{M}\) then \(A \equiv B\) if and only if \(B \in \mathscr{M}\).
Proof:
  1. Suppose that \( A \in \mathscr{N} \) and \( A \equiv B\). Then \( \P(B) = 0 \) by the previous result. Conversely, note that \( A \setminus B \subseteq A \) and \( B \setminus A \subseteq B \) so if \( \P(A) = \P(B) = 0 \) then \( \P(A \bigtriangleup B) = 0 \) so \( A \equiv B \).
  2. Part (b) follows from part (a) and the result above on complements.

We can extend the notion of equivalence to random variables taking values in the same space. Thus suppose that \( (T, \mathscr{T}) \) is another measurable space

Random variables \(X\) and \(Y\) taking values in \(T\) are said to be equivalent if \( \P(X = Y) = 1 \). Again we write \( X \equiv Y \).

As with events, the relation \( \equiv \) really does define an equivalence relation on the collection of random variables that take values in \(T\). Thus, the collection of such random variables is partitioned into disjoint classes of mutually equivalent variables.

The relation \( \equiv \) is an equivalence relation on the collection of random variables that take values in \(T\). That is, for random variables \( X \), \( Y \), and \( Z \) with values in \( T \),

  1. \(X \equiv X\) (the reflexive property).
  2. If \(X \equiv Y\) then \(Y \equiv X\) (the symmetric property).
  3. If \( X \equiv Y\) and \(Y \equiv Z\) then \(X \equiv Z\) (the transitive property).
Proof:

Parts (a) and (b) are trivially. For (c) note that \( \{X = Y\} \cap \{Y = Z\} \subseteq \{X = Z\} \) and hence if \( P(X = Y) = 1 \) and \( P(Y = Z) = 1 \) then \( \P(X = Z) = 1 \).

Suppose that \(X\) and \(Y\) are equivalent random variables, taking values in \(T\). Then for any \(B \in \mathscr{T}\), the events \(\{X \in B\}\) and \(\{Y \in B\}\) are equivalent.

Proof:

Note that \( \{X \in B\} \bigtriangleup \{Y \in B\} \subseteq \{X \ne Y\} \).

Thus if \( X \) and \( Y \) are equivalent, then by the previous result and the result above on equal probability, \( \P(X \in B) = \P(Y \in B) \) for every \( B \in \mathscr{T} \) and hence \(X\) and \(Y\) have the same probability distribution. Again, the converse fails with a passion, as an exercise below shows.

It often happens that a definition for random variables subsumes the corresponding definition for events, by considering the indicator variables of the events. So it is with equivalence.

Suppose that \(A \in \mathscr{S}\) and \(B \in \mathscr{S}\) are events. Then \(A\) and \(B\) are equivalent if and only if the indicator random variables \(\bs{1}_A\) and \(\bs{1}_B\) are equivalent.

Proof:

Note that \( \left\{\bs{1}_A \ne \bs{1}_B\right\} = A \bigtriangleup B \). Thus, \( \bs{1}_A \equiv \bs{1}_B \) if and only if \( \P\left(\bs{1}_A \ne \bs{1}_B\right) = \P(A \bigtriangleup B) = 0 \) if and only if \( A \equiv B \).

Equivalence is preserved under a deterministic transformation of the variables.

Suppose that \( (U, \mathscr{U}) \) is yet another measurable space and that \( g: T \to U \) is measurable. If \(X\) and \(Y\) are equivalent random variables, with values in \(T\), then \(g(X)\) and \(g(Y)\) are equivalent.

Proof:

Note that \( \{X = Y\} \subseteq \{g(X) = g(Y)\} \) so if \( \P(X = Y) = 1 \) then \( \P\left[g(X) = g(Y)\right] = 1 \).

Spaces of Random Variables

Suppose again that \( (S, \mathscr{S}, \P) \) is a probability space corresponding to a random experiment. Let \( \mathscr{U} \) denote the collection of all real-valued random variables for the experiment, that is, all measurable functions from \( S \) into \( \R \). From our general discussion of measure theory, it follows that with the usual definitions of addition and scalar multiplication, \( (\mathscr{U}, +, \cdot) \) is a vector space. However, in probability theory, we often do not want to distinguish between random variables that are equivalent, so it's nice to know that the vector space structure is preserved when we identify equivalent random variables. Formally, let \( [X] \) denote the equivalence class generated by a real-valued random variable \( X \in \mathscr{U} \), and let \( \mathscr{V} \) denote the collection of all such equivalence classes. In modular notation, \( \mathscr{V}\) is \(\mathscr{U} \big/ \equiv \). We define addition and scalar multiplication on \( \mathscr{V} \) by \[ [X] + [Y] = [X + Y], \; c [X] = [c X]; \quad [X], \; [Y] \in \mathscr{V}, \; c \in \R \]

\( (\mathscr{V}, +, \cdot) \) is a vector space.

Proof:

All that we have to show is that addition and scalar multiplication are well-defined. That is, we must show that the definitions do not depend on the particularly representative of the equivalence class. Then the other properties that define a vector space are inherited from \( (\mathscr{U}, +, \cdot) \). Thus we must show that if \( X_1 \equiv X_2 \) and \( Y_1 \equiv Y_2 \), and if \( c \in \R \), then \( X_1 + Y_1 \equiv X_2 + Y_2 \) and \( c X_1 \equiv c X_2 \). But these results follow immediately from the result above on transformations.

Spaces of functions in a general measure space are studied in the chapter on Distributions, and spaces of random variables are studied in more detail in the chapter on Expected Value.

Completion

Suppose that \( A \in \mathscr{N} \) so that \( \P(A) = 0 \). If \( B \subseteq A \) and \( B \in \mathscr{S} \), then we know that \( \P(B) = 0 \) so \( B \in \mathscr{N} \) also. However, in general there might be subsets of \( A \) that are not in \( \mathscr{S} \). This leads naturally to the following definition.

The \( \sigma \)-algebra \( \mathscr{S} \) is complete with respect to \( \P \) if \( A \in \mathscr{N} \) and \( B \subseteq A \) imply \( B \in \mathscr{S} \) (and hence \( B \in \mathscr{N} \)).

Thus, the collection of events \( \mathscr{S} \) is complete with respect to \( \P \) if every subset of an event with probability 0 is also an event (and hence also has probability 0). Fortunately, if \( \mathscr{S} \) is not complete, it can always be completed. To do this, we first need to extend the relation \( \equiv \) that we used above on \( \mathscr{S} \) to \( \mathscr{P}(S) \) (the power set of \( S \)).

For \( A, \, B \subseteq S \), define \( A \equiv B \) if and only if there exists \( N \in \mathscr{N} \) such that \( A \bigtriangleup B \subseteq N \). The relation \( \equiv \) is an equivalence relation on \( \mathscr{P}(S) \): For \( A, \; B, \; C \subseteq S \),

  1. \( A \equiv A \) (the reflexive property).
  2. If \( A \equiv B \) then \( B \equiv A \) (the symmetric property).
  3. If \( A \equiv B \) and \( B \equiv C \) then \( A \equiv C \) (the transitive property).
Proof:
  1. Note that \( A \bigtriangleup A = \emptyset \) and \( \emptyset \in \mathscr{N} \).
  2. Suppose that \( A \bigtriangleup B \subseteq N \) where \( N \in \mathscr{N} \). Then \( B \bigtriangleup A = A \bigtriangleup B \subseteq N\).
  3. Suppose that \( A \bigtriangleup B \subseteq N_1 \) and \( B \bigtriangleup C \subseteq N_2\) where \( N_1, \; N_2 \in \mathscr{N} \). Then \( A \bigtriangleup C \subseteq (A \bigtriangleup B) \cup (B \bigtriangleup C) \subseteq N_1 \cup N_2 \), and \( N_1 \cup N_2 \in \mathscr{N} \).

Note that \( A \equiv \emptyset \) if and only if \( A \subseteq N \) for some \( N \in \mathscr{N} \). Our next step is to enlarge the \( \sigma \)-algebra \( \mathscr{S} \) by adding any set that is equivalent to a set in \( \mathscr{S} \).

Let \( \mathscr{S}_0 = \{A \subseteq S: A \equiv B \text{ for some } B \in \mathscr{S} \} \). Then \( \mathscr{S}_0 \) is a \( \sigma \)-algebra of subsets of \( S \), and in fact is the \( \sigma \)-algebra generated by \( \mathscr{S} \cup \{A \subseteq S: A \equiv \emptyset\} \).

Proof:

Note that if \( A \in \mathscr{S} \) then \( A \equiv A \) so \( A \in \mathscr{S}_0 \). In particular, \( S \in \mathscr{S}_0 \). Also, \( \emptyset \in \mathscr{S} \) so if \( A \equiv \emptyset \) then \( A \in \mathscr{S}_0 \). Suppose that \( A \in \mathscr{S}_0 \) so that \( A \equiv B \) for some \( B \in \mathscr{S} \). Then \( B^c \in \mathscr{S} \) and \( A^c \equiv B^c \) so \( A^c \in \mathscr{S}_0 \). Next suppose that \( A_i \in \mathscr{S}_0 \) for \( i \) in a countable index set \( I \). Then for each \( i \in I \) there exists \( B_i \in \mathscr{S} \) such that \( A_i \equiv B_i \). But then \( \bigcup_{i \in I} B_i \in \mathscr{S} \) and \( \bigcup_{i \in I} A_i \equiv \bigcup_{i \in I} B_i \), so \( \bigcup_{i \in I} A_i \in \mathscr{S}_0 \). Therefore \( \mathscr{S}_0 \) is a \( \sigma \)-algebra of subsets of \( S \). Finally, suppose that \( \mathscr{T} \) is a \( \sigma \)-algebra of subsets of \( S \) and that \( \mathscr{S} \cup \{A \subseteq S: A \equiv \emptyset\} \subseteq \mathscr{T} \). We need to show that \( \mathscr{S}_0 \subseteq \mathscr{T} \). Thus, suppose that \( A \in \mathscr{S}_0 \) Then there exists \( B \in \mathscr{S} \) such that \( A \equiv B \). But \( B \in \mathscr{T} \) and \( A \bigtriangleup B \in \mathscr{T} \) so \( A \cap B = B \setminus (A \bigtriangleup B) \in \mathscr{T}\). Also \( A \setminus B \in \mathscr{T} \), so \( A = (A \cap B) \cup (A \setminus B) \in \mathscr{T} \).

Our last step is to extend \( \P \) to a probability measure on the enlarged \( \sigma \)-algebra \( \mathscr{S}_0 \).

Suppose that \( A \in \mathscr{S}_0 \) so that \( A \equiv B \) for some \( B \in \mathscr{S} \). Define \( \P_0(A) = \P(B) \). Then

  1. \( \P_0 \) is well-defined.
  2. \( \P_0(A) = \P(A) \) for \( A \in \mathscr{S} \).
  3. \( \P_0 \) is a probability measure on \( \mathscr{S}_0 \).
Proof:
  1. Suppose that \( A \in \mathscr{S}_0 \) and that \( A \equiv B_1 \) and \( A \equiv B_2 \) where \( B_1, \; B_2 \in \mathscr{S} \). Then \(B_1 \equiv B_2 \) so \( \P(B_1) = \P(B_2) \). Thus, \( \P_0 \) is well-defined.
  2. Next, if \( A \in \mathscr{S} \) then of course \( A \equiv A \) so \( \P_0(A) = \P(A) \).
  3. From part (b), \( \P_0(S) = 1 \). Trivially \( \P_0(A) \ge 0 \) for \( A \in \mathscr{S}_0 \). Thus we just need to show the countable additivity property. Towards that end, suppose that \( \{A_i: i \in I\} \) is a countable collection of pairwise disjoint sets in \( \mathscr{S}_0 \). For each \( i \in I \) there exists \( B_i \in \mathscr{S} \) and \( N_i \in \mathscr{N} \) such that \( A_i \bigtriangleup B_i \subseteq N_i \). That is, \( A_i \equiv B_i \), so in particular \( \P_0(A_i) = \P(B_i) \). Now \( \bigcup_{i \in I} A_i \equiv \bigcup_{i \in I} B_i \), so \( \P_0\left(\bigcup_{i \in I} A_i\right) = \P\left(\bigcup_{i \in I} B_i\right) \). By Boole's inequality, \( \P\left(\bigcup_{i \in I} B_i\right) \le \sum_{i \in I} \P(B_i) = \sum_{i \in I} \P_0(A_i) \). But also, \( B_i \cap B_j \subseteq N_i \cup N_j \) and therefore \( \P(B_i \cap B_j) = 0 \) for \( i \ne j \). By Bonferroni's inequality, \( \P\left(\bigcup_{i \in I} B_i\right) \ge \sum_{i \in I} \P(B_i) = \sum_{i \in I} \P_0(A_i) \). Thus we conclude that \( \P_0\left(\bigcup_{i \in I} A_i\right) = \sum_{i \in I} \P_0(A_i) \).

More generally, if \( \mu \) is a \( \sigma \)-finite measure on a measurable space \( (S, \mathscr{S}) \), then \( \mathscr{S} \) can be completed with respect to \( \mu \).

Independence

As usual, suppose that \( (S, \mathscr{S}, \P) \) is a probability space. We have already studied the independence of collections of events and the independence of collections of random variables. A more complete and general treatment results if we define the independence of collections of collections of events, and most importantly, the independence of collections of \( \sigma \)-algebras. This extension actually occurred already, when we went from independence of a collection of events to independence of a collection of random variables, but we did not note it at the time. In spite of the layers of set theory, the basic idea is the same.

Suppose that \( \mathscr{A}_i \) is a collection of events for each \( i \) in an index set \( I \). Then \( \mathscr{A} = \{\mathscr{A}_i: i \in I\} \) is independent if and only if for every choice of \( A_i \in \mathscr{A}_i \) for \( i \in I \), the collection of events \(\{ A_i: i \in I\} \) is independent. That is, for every finite \(J \subseteq I \), \[ \P\left(\bigcap_{j \in J} A_j\right) = \prod_{j \in J} \P(A_j) \]

As noted above, independence of random variables, as we defined previously, is a special case of our new definition.

Suppose that \( (T_i, \mathscr{T}_i) \) is a measurable space for each \( i \) in an index set \( I \), and that \( X_i \) is a random variable taking values in a set \( T_i \) for each \( i \in I \). The independence of \( \{X_i: i \in I\} \) is equivalent to the independence of \( \{\sigma(X_i): i \in I\} \).

Independence of events is also a special case of the new definition, and thus our new definition really does subsume our old one.

Suppose that \( A_i \) is an event for each \( i \in I \). The independence of \( \{A_i: i \in I\} \) is equivalent to the independence of \( \{\mathscr{A}_i: i \in I\} \) where \( \mathscr{A}_i = \sigma\{A_i\} = \{S, \emptyset, A_i, A_i^c\} \) for each \( i \in I \).

For every collection of objects that we have considered (collections of events, collections of random variables, collections of collections of events), the notion of independence has the basic inheritance property.

Suppose that \( \mathscr{A} \) is a collection of collections of events.

  1. If \( \mathscr{A} \) is independent then \( \mathscr{B} \) is independent for every \( \mathscr{B} \subseteq \mathscr{A} \).
  2. If \( \mathscr{B} \) is independent for every finite \( \mathscr{B} \subseteq \mathscr{A} \) then \( \mathscr{A} \) is independent.

Our most important collections are \( \sigma \)-algebras, and so we are most interested in the independence of a collection of \( \sigma \)-algebras. The next result allows us to go from the independence of certain types of collections to the independence of the \( \sigma \)-algebras generated by these collections. To understand the result, you will need to review the definitions and theorems concerning \( \pi \)-systems and \( \lambda \)-systems. The proof uses Dynkin's \( \pi \)-\( \lambda \) theorem, named for Eugene Dynkin.

Suppose that \( \mathscr{A}_i \) is a collection of events for each \( i \) in an index set \( I \), and that \( \mathscr{A_i} \) is a \( \pi \)-system for each \( i \in I \). If \( \left\{\mathscr{A}_i: i \in I\right\} \) is independent, then \( \left\{\sigma(\mathscr{A}_i): i \in I\right\} \) is independent.

Proof:

In light of the previous result, it suffices to consider a finite set of collections. Thus, suppose that \( \{\mathscr{A}_1, \mathscr{A}_2, \ldots, \mathscr{A}_n\} \) is independent. Now, fix \( A_i \in \mathscr{A}_i \) for \( i \in \{2, 3, \ldots, n\} \) and let \( E = \bigcap_{i=2}^n A_i \). Let \( \mathscr{L} = \{B \in \mathscr{S}: \P(B \cap E) = \P(B) \P(E)\} \). Trivially \( S \in \mathscr{L} \) since \( \P(S \cap E) = \P(E) = \P(S) \P(E) \). Next suppose that \( A \in \mathscr{L} \). Then \[ \P(A^c \cap E) = \P(E) - \P(A \cap E) = \P(E) - \P(A) \P(E) = [1 - \P(A)] \P(E) = \P(A^c) \P(E) \] Thus \( A^c \in \mathscr{L} \). Finally, suppose that \( \{A_j: j \in J\} \) is a countable collection of disjoint sets in \( \mathscr{L} \). Then \[ \P\left[\left(\bigcup_{j \in J} A_j \right) \cap E \right] = \P\left[ \bigcup_{j \in J} (A_j \cap E) \right] = \sum_{j \in J} \P(A_j \cap E) = \sum_{j \in J} \P(A_j) \P(E) = \P(E) \sum_{j \in J} \P(A_j) = \P(E) \P\left(\bigcup_{j \in J} A_j \right) \] Therefore \( \bigcup_{j \in J} A_j \in \mathscr{L} \) and so \( \mathscr{L} \) is a \( \lambda \)-system. Trivially \( \mathscr{A_1} \subseteq \mathscr{L} \) by the original independence assumption, so by the \( \pi \)-\( \lambda \) theorem, \( \sigma(\mathscr{A}_1) \subseteq \mathscr{L} \). Thus, we have that for every \( A_1 \in \sigma(\mathscr{A}_1) \) and \( A_i \in \mathscr{A}_i \) for \( i \in \{2, 3, \ldots, n\} \), \[ \P\left(\bigcap_{i=1}^n A_i \right) = \prod_{i=1}^n \P(A_i) \] Thus we have shown that \( \left\{\sigma(\mathscr{A}_1), \mathscr{A}_2, \ldots, \mathscr{A}_n\right\} \) is independent. Repeating the argument \( n - 1 \) additional times, we get that \( \{\sigma(\mathscr{A}_1), \sigma(\mathscr{A}_2), \ldots, \sigma(\mathscr{A}_n)\} \) is independent.

The next result is a rigorous statement of the strong independence that is implied the independence of a collection of events.

Suppose that \( \mathscr{A} \) is an independent collection of events, and that \( \left\{\mathscr{B}_j: j \in J\right\} \) is a partition of \( \mathscr{A} \). That is, \( \mathscr{B}_j \cap \mathscr{B}_k = \emptyset \) for \( j \ne k \) and \( \bigcup_{j \in J} \mathscr{B}_j = \mathscr{A} \). Then \( \left\{\sigma(\mathscr{B}_j): j \in J\right\} \) is independent.

Proof:

Let \( \mathscr{B}_j^* \) denote the set of all finite intersections of sets in \( \mathscr{B}_j \), for each \( j \in J \). Then clearly \( \mathscr{B}_j^* \) is a \( \pi \)-system for each \( j \), and \( \left\{\mathscr{B}_j^*: j \in J\right\} \) is independent. By the previous theorem, \( \left\{\sigma(\mathscr{B}_j^*): j \in J\right\} \) is independent. But clearly \( \sigma(\mathscr{B}_j^*) = \sigma(\mathscr{B}_j) \) for \( j \in J \).

Let's bring the result down to earth. Suppose that \( A, B, C, D \) are independent events. In our elementary discussion, you were asked to show, for example, that \( A \cup B^c \) and \( C \cap D^c \) are independent. This is a consequence of the much stronger statement that the \( \sigma \)-algebras \( \sigma\{A, B\} \) and \( \sigma\{C, D\} \) are independent.

Exchangeability

As usual, suppose that \( (S, \mathscr{S}, \P) \) is a probability space corresponding to a random experiment Roughly speaking, a sequence of events or a sequence of random variables for the experiment is exchangeable if the probability law that governs the sequence is unchanged when the order of the events or variables is changed. Exchangeable variables arise naturally in sampling experiments and many other settings, and are a natural generalization of a sequence of independent, identically distributed (IID) variables. Conversely, it turns out that any exchangeable sequence of variables can be constructed from an IID sequence. First we give the definition for events:

Suppose that \(\mathscr{A} = \{A_i: i \in I\}\) is a collection of events in random experiment, where \(I\) is a countable index set. The collection is said to be exchangeable if the probability of the intersection of a finite number of the events depends only on the number of events. That is, if \(J\) and \(K\) are finite subsets of \(I\) and \(\#(J) = \#(K)\) then \[\P\left( \bigcap_{j \in J} A_j\right) = \P \left( \bigcap_{k \in K} A_k\right)\]

Exchangeability has the same basic inheritance property that we have seen before.

Suppose that \(\mathscr{A}\) is a collection of events.

  1. If \(\mathscr{A}\) is exchangeable then \(\mathscr{B}\) is exchangeable for every \(\mathscr{B} \subseteq \mathscr{A}\).
  2. Conversely, if \(\mathscr{B}\) is exchangeable for every finite \(\mathscr{B} \subseteq \mathscr{A}\) then \(\mathscr{A}\) is exchangeable.

For a collection of exchangeable events, the inclusion exclusion law for the probability of a union is much simpler than the general version.

Suppose that \(\{A_1, A_2, \ldots, A_n\}\) is an exchangeable collection of events. For \(J \subseteq \{1, 2, \ldots, n\}\) with \(\#(J) = k\), let \(p_k = \P\left( \bigcap_{j \in J} A_j\right)\). Then \[\P\left(\bigcup_{i = 1}^n A_i\right) = \sum_{k=1}^n (-1)^{k-1} \binom{n}{k} p_k\]

The concept of exchangeablility can be extended to random variables in the natural way. Suppose that \( (T, \mathscr{T}) \) is a measurable space.

Suppose that \(\mathscr{A}\) is a collection of random variables, each taking values in \(T\). The collection \(\mathscr{A}\) is said to be exchangeable if for any \(\{X_1, X_2, \ldots, X_n\} \subseteq \mathscr{A}\), the distribution of the random vector \((X_1, X_2, \ldots, X_n)\) depends only on \(n\).

Thus, the distribution of the random vector is unchanged if the coordinates are permuted. Once again, exchangeability has the same basic inheritance property as a collection of independent variables.

Suppose that \(\mathscr{A}\) is a collection of random variables, each taking values in \( T \).

  1. If \(\mathscr{A}\) is exchangeable then \(\mathscr{B}\) is exchangeable for every \(\mathscr{B} \subseteq \mathscr{A}\).
  2. Conversely, if \(\mathscr{B}\) is exchangeable for every finite \(\mathscr{B} \subseteq \mathscr{A}\) then \(\mathscr{A}\) is exchangeable.

Suppose that \( \mathscr{A} \) is a collection of random variables, each taking values in \( T \), and that \( \mathscr{S} \) is exchangeable. Then trivially the variables are identically distributed: if \( X, \; Y \in \mathscr{A} \) and \( A \in \mathscr{T} \), then \( \P(X \in A) = \P(Y \in A) \). Also, the definition of exchangeable variables subsumes the definition for events:

Suppose that \(\mathscr{A}\) is a collection of events, and let \(\mathscr{B} = \{\bs{1}_A: A \in \mathscr{A}\}\) denote the corresponding collection of indicator random variables. Then \(\mathscr{A}\) is an exchangeable collection of events if and only if \(\mathscr{B}\) is exchangeable collection of random variables.

Tail Events and Variables

Suppose again that we have a random experiment modeled by a probability space \( (S, \mathscr{S}, \P) \).

Suppose that \((X_1, X_2, \ldots)\) be a sequence of random variables. The tail sigma algebra of the sequence is \[ \mathscr{T} = \bigcap_{n=1}^\infty \sigma\{X_n, X_{n+1}, \ldots\} \]

  1. An event \(B \in \mathscr{T}\) is a tail event for the sequence.
  2. A random variable \( Y \) that is measurable with respect to \( \mathscr{T} \) is a tail random variable for the sequence.

Informally, a tail event (random variable) is an event (random variable) that can be defined in terms of \(\{X_n, X_{n+1}, \ldots\}\) for each \(n \in \N_+\). The tail sigma algebra for a sequence of events \( (A_1, A_2, \ldots) \) is defined analogously (or simply let \(X_k = \bs{1}(A_k)\), the indicator variable of \(A\), for each \(k\)). For the following results, you may need to review some of the definitions in the section on Convergence.

Suppose that \((A_1, A_2, \ldots)\) is a sequence of events.

  1. If the sequence is increasing then \(\lim_{n \to \infty} A_n = \bigcup_{n=1}^\infty A_n\) is a tail event of the sequence.
  2. If the sequence is decreasing then \(\lim_{n \to \infty} A_n = \bigcap_{n=1}^\infty A_n\) is a tail event of the sequence.
Proof:
  1. If the sequence is increasing then \( \bigcup_{n=1}^\infty A_n = \bigcup_{n=k}^\infty A_n \in \sigma\{A_k, A_{k+1}, \ldots\}\) for every \( k \in \N_+ \).
  2. If the sequence is decreasing then \( \bigcap_{n=1}^\infty A_n = \bigcap_{n=k}^\infty A_k \in \sigma\{A_k, A_{k+1}, \ldots\} \) for every \( k \in \N_+ \)

Suppose again that \( (A_1, A_2, \ldots) \) is a sequence of events. Each of the following is a tail event of the sequence:

  1. \(\limsup_{n \to \infty} A_n = \bigcap_{n=1}^\infty \bigcup_{i=n}^\infty A_i\)
  2. \(\liminf_{n \to \infty} A_n = \bigcup_{n=1}^\infty \bigcap_{i=n}^\infty A_i\)
Proof:
  1. The events \( \bigcup_{i=n}^\infty A_i \) are decreasing in \( n \) and hence \( \limsup_{n \to \infty} A_n = \lim_{n \to \infty} \bigcup_{i=n}^\infty A_i \in \mathscr{T} \) by the previous result.
  2. The events \( \bigcap_{i=n}^\infty A_i \) are increasing in \( n \) and hence \( \liminf_{n \to \infty} A_n = \lim_{n \to \infty} \bigcap_{i=n}^\infty A_i \in \mathscr{T} \) by the previous result.

Suppose that \( \bs X = (X_1, X_2, \ldots) \) is a sequence of real-valued random variables.

  1. \(\{X_n \text{ converges as } n \to \infty\}\) is a tail event for \( \bs X \).
  2. \( \liminf_{n \to \infty} X_n \) is a tail random variable for \( \bs X \).
  3. \( \limsup_{n \to \infty} X_n \) is a tail random variable for \( \bs X \).
Proof:
  1. The Cauchy criterion for convergence (named for Augustin Cauchy of course) states that \( X_n \) converges as \( n \to \infty \) if an only if for every \( \epsilon > 0 \) there exists \( N \in \N_+ \) (depending on \( \epsilon \)) such that if \(m, \; n \ge N \) then \( \left|X_n - X_m\right| \lt \epsilon \). In this criterion, we can without loss of generality take \( \epsilon \) to be rational, and for a given \( k \in \N_+ \) we can insist that \( m, \; n \ge k \). With these restrictions, the Cauchy criterion is a countable intersection of events, each of which is in \( \sigma\{X_k, X_{k+1}, \ldots\} \).
  2. Recall that \( \liminf_{n \to \infty} X_n = \lim_{n \to \infty} \inf\{X_k: k \ge n\} \).
  3. Similarly, recall that \( \limsup_{n \to \infty} X_n = \lim_{n \to \infty} \sup\{X_k: k \ge n\} \).

The random variable in part (b) may take the value \( -\infty \), and the random variable in (c) may take the value \( \infty \). From parts (b) and (c) together, note that if \( X_n \to X_\infty \) as \( n \to \infty \) on the sample space \( \mathscr{S} \), then \( X_\infty \) is a tail random variable for \( \bs X \).

There are a number of zero-one laws in probability. These are theorems that give conditions under which an event will be essentially deterministic; that is, have probability 0 or probability 1. Interestingly, it can sometimes be difficult to determine which of these extremes is actually the case. The following result is the Kolmogorov zero-one law, named for Andrey Kolmogorov. It states that an event in the tail \(\sigma\)-algebra of an independent sequence will have probability 0 or 1.

Suppose that \( \bs X = (X_1, X_2, \ldots) \) is an independent sequence.

  1. If \(B\) is a tail event for \( \bs X \) then \(\P(B) = 0\) or \(\P(B) = 1\).
  2. If \( Y \) is a real-valued tail random variable for \( \bs X \) then \( Y \) is constant with probability 1.
Proof:
  1. By definition \( B \in \sigma\{X_{n+1}, X_{n+2}, \ldots\} \) for each \( n \in \N_+ \), and hence \(\{X_1, X_2, \ldots, X_n, \bs{1}_B\}\) is an independent set of random variables. Thus \(\{X_1, X_2, \ldots, \bs{1}_B\}\) is an independent set of random variables. But \( B \in \sigma\{X_1, X_2, \ldots\} \), so it follows that the event \(B\) is independent of itself. Therefore \(\P(B) = 0\) or \(\P(B) = 1\).
  2. The function \(y \mapsto \P(Y \le y) \) on \( \R \) is the cumulative) distribution function of \( Y \). This function is clearly increasing. Moreover, simple applications of the continuity theorems show that it is right continuous and that \( \P(Y \le y) \to 0 \) as \( y \to -\infty \) and \( \P(Y \le y) \to 1 \) as \( y \to \infty \). (Explicit proofs are given in the section on distribution functions in the chapter on Distributions.) But since \( Y \) is a tail random variable, \( \{Y \le y\} \) is a tail event and hence \( \P(Y \le y) \in \{0, 1\} \) for each \( y \in \R \). It follows that there exists \( c \in \R \) such that \( \P(Y \le y) = 0 \) for \( y \lt c \) and \( \P(Y \le y) = 1 \) for \( y \ge c \). Hence \( \P(Y = c) = 1 \).

From the Komogorov zero-one law and the result above, note that if \((A_1, A_2, \ldots)\) is a sequence of independent events, then \(\limsup_{n \to \infty} A_n\) must have probability 0 or 1. The Borel-Cantelli lemmas give conditions for which of these is correct:

Another proof of the Kolmogorov zero-one law will be given using the martingale convergence theorem.

Examples and Exercises

As always, be sure to try the computational exercises and proofs yourself before reading the answers and proofs in the text.

Counterexamples

Equal probability certainly does not imply equivalent events.

Consider the simple experiment of tossing a fair coin. The event that the coin lands heads and the event that the coin lands tails have the same probability, but are not equivalent.

Proof:

Let \( S \) denote the sample space, and \( H \) the event of heads, so that \( H^c \) is the event of tails. Since the coin is fair, \( \P(H) = \P(H^c) = \frac{1}{2} \). But \( H \bigtriangleup H^c = S\), so \( \P(H \bigtriangleup H^c) = 1 \), so \( H \) and \( H^c \) are as far from equivalent as possible.

Similarly, equivalent distributions does not imply equivalent random variables.

Consider the experiment of rolling a standard, fair die. Let \( X \) denote the score and \( Y = 7 - X \). Then \( X \) and \( Y \) have the same distribution but are not equivalent.

Proof:

Since the die is fair, \( X \) is uniformly distributed on \(S = \{1, 2, 3, 4, 5, 6\} \). Also \( \P(Y = k) = \P(X = 7 - k) = \frac{1}{6} \) for \( k \in S \), so \( Y \) also has the uniform distribution on \( S \). But \( \P(X = Y) = \P\left(X = \frac{7}{2}\right) = 0 \), so \( X \) and \( Y \) are as far from equivalent as possible.

Consider the experiment of rolling two standard, fair dice and recording the sequence of scores \( (X, Y) \). Then \( X \) and \( Y \) are independent and have the same distribution, but are not equivalent.

Proof:

Since the dice are fair, \( (X, Y) \) has the uniform distribution on \( \{1, 2, 3, 4, 5, 6\}^2 \). Equivalently, \( X \) and \( Y \) are independent, and each has the uniform distribution on \( \{1, 2, 3, 4, 5, 6\} \). But \( \P(X = Y) = \frac{1}{6} \), so \( X \) and \( Y \) are not equivalent.

Basic Properties

In the following problems, \( \mu \) is a positive measure on the measurable space \( (S, \mathscr{S}) \).

Suppose that \( \mu(S) = 20 \) and that \(A, B \in \mathscr{S}\) with \(\mu(A) = 5\), \(\mu(B) = 6 \), \(\mu(A \cap B) = 2\). Find the measure of each of the following sets:

  1. \(A \setminus B\)
  2. \(A \cup B\)
  3. \(A^c \cup B^c\)
  4. \(A^c \cap B^c\)
  5. \(A \cup B^c\)
Answer:
  1. 3
  2. 9
  3. 18
  4. 11
  5. 16

Suppose that \( \mu(S) = \infty \) and that \(A, \, B \in \mathscr{S}\) with \(\mu(A \setminus B) = 2\), \(\mu(B \setminus A) = 3\), and \(\mu(A \cap B) = 4\). Find the measure of each of the following sets:

  1. \(A\)
  2. \(B\)
  3. \(A \cup B\)
  4. \( A^c \cap B^c \)
  5. \( A^c \cup B^c \)
Answer:
  1. 6
  2. 7
  3. 9
  4. \(\infty\)
  5. \(\infty\)

Suppose that \( \mu(S) = 10 \) and that \(A, \, B \in \mathscr{S}\) with \(\mu(A) = 3\), \(\mu(A \cup B) = 7\), and \(\mu(A \cap B) = 2\). Find the measure of each of the following events:

  1. \(B\)
  2. \(A \setminus B\)
  3. \(B \setminus A\)
  4. \(A^c \cup B^c\)
  5. \(A^c \cap B^c\)
Answer:
  1. 6
  2. 1
  3. 4
  4. 8
  5. 3

Suppose that \( A, \, B, \, C \in \mathscr{S} \) with \( \mu(A) = 10 \), \( \mu(B) = 12 \), \( \mu(C) = 15 \), \( \mu(A \cap B) = 3 \), \( \mu(A \cap C) = 4 \), \( \mu(B \cap C) = 5 \), and \( \mu(A \cap B \cap C) = 1S \). Find the probabilities of the various unions:

  1. \( A \cup B \)
  2. \( A \cup C \)
  3. \( B \cup C \)
  4. \( A \cup B \cup C \)
Answer:
  1. 21
  2. 23
  3. 22
  4. 28