\( \renewcommand{\P}{\mathbb{P}} \)
\( \newcommand{\E}{\mathbb{E}} \)
\( \newcommand{\R}{\mathbb{R}} \)
\( \newcommand{\Q}{\mathbb{Q}} \)
\( \newcommand{\N}{\mathbb{N}} \)
\( \newcommand{\bs}{\boldsymbol} \)
\( \newcommand{\range}{\text{range}} \)

Probability density functions have very different interpretations for discrete distributions as opposed to continuous distributions. For a discrete distribution, the probability of an event is computed by summing the density function over the outcomes in the event, while for a continuous distribution, the probability is computed by integrating the density function over the outcomes. For a mixed distributions, we have partial discrete and continuous density functions and the probability of an event is computed by summing and integrating. The various types of density functions can unified under a general theory of integration, which is the subject of this section. This theory has enormous importance in probability, far beyond just density functions. Expected value, which we consider in the next chapter, can be interpreted as an integral with respect to a probability measure. Beyond probability, the general theory of integration is of fundamental importance in many areas of mathematics.

Our starting point is a measure space \( (S, \mathscr{S}, \mu) \). That is, \( S \) is a set, \( \mathscr{S} \) is a \( \sigma \)-algebra of subsets of \( S \), and \( \mu \) is a positive measure on \( \mathscr{S} \). As usual, the most important special cases are

- \( S \) is a Lebesgue measurable subset of \( \R^n \) for some \( n \in \N_+ \), \( \mathscr{S} \) is the \( \sigma \)-algebra of Lebesgue measurable subsets of \( S \), and \( \mu = \lambda_n \), standard \( n \)-dimensional Lebesgue measure.
- \( S \) is a countable set, \( \mathscr{S} \) is the collection of all subsets of \( S \), and \( \mu = \# \), counting measure.
- \( S \) is the sample space of a random experiment, \( \mathscr{S} \) is the \( \sigma \)-algebra of events, and \( \mu = \P \), a probability measure.

Consider a statement on the elements of \(S \), for example an equation or an inequality with \( x \in S \) as a free variable. (Technically such a statement is a predicate on \( S \).) For \( A \in \mathscr{S} \), we say that the statement holds on \( A \) if it is true for every \( x \in A \). We say that the statement holds almost everywhere on \( A \) (with respect to \( \mu \)) if there exists \( B \in \mathscr{S} \) with \( B \subseteq A \) such that the statement holds on \( B \) and \( \mu(A \setminus B) = 0 \).

Our goal is to define the integral of certain measurable functions \( f: S \to \R \), with respect to the measure \( \mu \). The integral may exist as a number in \( \R \) (in which case we say that \( f \) is integrable), or may exist as \( \infty \) or \( -\infty \), or may not exist at all. When it exists, the integral is denoted variously by \[ \int_S f \, d\mu, \; \int_S f(x) \, d\mu(x), \; \int_S f(x) \mu(dx) \] We will use the first two.

Since the set of extended real numbers \( \R^* = \R \cup \{-\infty, \infty\} \) plays an important role in the theory, we need to recall the arithmetic of \( \infty \) and \( -\infty \). Here are the conventions that are appropriate for integration:

Arithmetic on \( \R^* \)

- If \( a \in (0, \infty] \) then \( a \cdot \infty = \infty \) and \( a \cdot (-\infty) = -\infty \)
- If \( a \in [-\infty, 0) \) then \( a \cdot \infty = -\infty \) and \( a \cdot (-\infty) = \infty \)
- \(0 \cdot \infty = 0 \) and \( 0 \cdot (-\infty) = 0 \)
- If \( a \in \R \) then \( a + \infty = \infty \) and \( a + (-\infty) = -\infty \)
- \( \infty + \infty = \infty \)
- \( -\infty + (-\infty) = -\infty \)

However, \( \infty - \infty \) is not defined (because it does not make consistent sense) and we must be careful never to produce this indeterminate form. You might recall from calculus that \( 0 \cdot \infty \) is also an indeterminate form. However, for the theory of integration, the convention that \( 0 \cdot \infty = 0 \) is convenient and consistent. In terms of order of course, \(-\infty \lt a \lt \infty\) for \( a \in \R \).

As motivation for the definition, every version of integration should satisfy some basic properties. First, the integral of the indicator function of a measurable set should simply be the size of the set, as measured by \( \mu \). This gives our first definition:

If \( A \in \mathscr{S} \) then \( \int_S \bs{1}_A \, d\mu = \mu(A) \).

This definition hints at the intimate relationship between measure and integration. We will construct the integral from the measure \( \mu \) in this section, but this first property shows that if we started with the integral, we could recover the measure. This property also shows why we need \( \infty \) as a possible value of the integral, and coupled with some of the properties below, why \( -\infty \) is also needed. Here is a simple corollary of our first definition.

\( \int_S 0 \, d\mu = 0 \)

Note that \( \int_S 0 \, d\mu = \int_S \bs{1}_\emptyset \, d\mu = \mu(\emptyset) = 0 \).

We give three more essential properties that we want. First are the linearity properties in two parts—part (a) is the additive property and part (b) is the scaling property.

If \( f, \; g: S \to \R \) are measurable functions whose integrals exist, and \( c \in \R \), then

- \( \int_S (f + g) \, d\mu = \int_S f \, d\mu + \int_S g \, d\mu \) as long as the right side is not of the form \( \infty - \infty \)
- \( \int_S c f \, d\mu = c \int_S f \, d\mu \).

The steps below do not constitute a proof because questions of the existence of the integrals are ignored and because the limit interchange in the last step is not justified. Still, the argument shows the close relationship between the additive property and the scaling property.

- If \( n \in \N_+ \), then by (a) and induction, \( \int_S n f \, d\mu = n \int_S f \, d\mu \).
- From step (1), if \( n \in \N_+ \) then \( \int f \, d\mu = \int_S n \frac{1}{n} f \, d\mu = n \int_S \frac{1}{n} f \, d\mu \) so \( \int_S \frac{1}{n} f \, d\mu = \frac{1}{n} \int_S f \, d\mu \).
- If \( m, \; n \in \N_+ \) then from steps (1) and (2) \( \int_S \frac{m}{n} f \, d\mu = m \int_S \frac{1}{n} f \, d\mu = \frac{m}{n} \int_S f \, d\mu \).
- \( 0 = \int_S 0 \, d\mu = \int_S (f - f) \, d\mu = \int_S f \, d\mu + \int_S - f \, d\mu \) so \( \int_S -f \, d\mu = -\int_S f \, d\mu \).
- By steps (3) and (4), \( \int_S c f \, d\mu = c \int_S f \, d\mu \) for every \( c \in \Q \) (the set of rational real numbers).
- If \( c \in \R \) there exists \( c_n \in \Q \) for \( n \in \N_+ \) with \( c_n \to c \) as \( n \to \infty \). By step (5), \( \int_S c_n f \, d\mu = c_n \int_S f \, d\mu \).
- Taking limits in step (6) suggests \( \int_S c f \, d\mu = c \int_S f \, d\mu \).

To be more explicit, we want the additivity property (a) to hold if at least one of the integrals on the right is finite, or if both are \( \infty \) or if both are \( -\infty \). What is ruled out are the two cases where one integral is \( \infty \) and the other is \( -\infty \), and this is what is meant by the indeterminate form \( \infty - \infty \). Our next essential properties are the order properties, again in two parts—part (a) is the positive property and part (b) is the increasing property.

Suppose that \( f, \, g: S \to \R \) are measurable.

- If \( f \ge 0 \) on \( S \) then \( \int_S f \, d\mu \ge 0 \).
- If the integrals of \( f \) and \( g \) exist and \( f \le g \) on \( S \) then \( \int_S f \, d\mu \le \int_S g \, d\mu \)

Implicit in part (a) is that the integral of a nonnegative, measurable function always exists in \( [0, \infty] \). Suppose that the integrals of \( f \) and \( g \) exist and \( f \le g \) on \( S \). Then \( g - f \ge 0 \) on \( S \) and \( g = f + (g - f) \). If \( \int_S f \, d\mu = -\infty \), then trivially. \( \int_S f \, d\mu \le \int_S g \, d\mu \). Otherwise, by the additivity property, \[ \int_S g \, d\mu = \int_S f \, d\mu + \int_S (g - f) \, d\mu\] But \( \int_S (g - f) \, d\mu \ge 0 \) (so in particular the right side is not \( -\infty + \infty \)), and hence \( \int_S g \, d\mu \ge \int_S f \, d\mu \)

Our last essential property is perhaps the least intuitive, but is a type of continuity property of integration, and is closely related to the continuity property of positive measure. The official name is the monotone convergence theorem.

Suppose that \( f_n: S \to [0, \infty) \) is measurable for \( n \in \N_+ \) and that \( f_n \) is increasing in \( n \). Then \[ \int_S \lim_{n \to \infty} f_n \, d\mu = \lim_{n \to \infty} \int_S f_n d \mu \]

Note that since \( f_n \) is increasing in \( n \), \( \lim_{n \to \infty} f_n(x) \) exists in \( \R \cup \{\infty\} \) for each \( x \in \R \). This property shows that it is sometimes convenient to allow nonnegative functions to take the value \( \infty \). Note also that by the increasing property, \( \int_S f_n \, d\mu \) is increasing in \( n \) and hence also has a limit in \( \R \cup \{\infty\} \).

To see the connection with measure, suppose that \( (A_1, A_2, \ldots) \) is an increasing sequence of sets in \( \mathscr{S} \), and let \( A = \bigcup_{i=1}^\infty A_i \). Note that \( \bs{1}_{A_n} \) is increasing in \( n \in \N_+ \) and \( \bs{1}_{A_n} \to \bs{1}_{A} \) as \( n \to \infty \). For this reason, the union \( A \) is sometimes called the limit of \( A_n \) as \( n \to \infty \). The continuity theorem of positive measure states that \( \mu(A_n) \to \mu(A) \) as \( n \to \infty \). Equivalently, \(\int_S \bs{1}_{A_n} \, d\mu \to \int_S \bs{1}_A \, d\mu\) as \( n \to \infty \), so the continuity theorem of positive measure is a special case of the monotone convergence theorem.

Armed with the properties that we want, the definition of the integral is fairly straightforward, and proceeds in stages. We give the definition successively for

- Nonnegative simple functions
- Nonnegative measurable functions
- Measurable real-valued functions

Of course, each definition should agree with the previous one on the functions that are in both collections.

A simple function on \( S \) is simply a measurable, real-valued function with finite range. Simple functions are usually expressed as linear combinations of indicator functions.

Representations of simple functions

- Suppose that \( I \) is a finite index set, \( a_i \in \R \) for each \( i \in I \), and \( \{A_i: i \in I\} \) is a collection of sets in \( \mathscr{S} \) that partition \( S \). Then \( f = \sum_{i \in I} a_i \bs{1}_{A_i} \) is a simple function. Expressing a simple function in this form is a representation of \( f \).
- A simple function \( f \) has a unique representation as \( f = \sum_{j \in J} b_j \bs{1}_{B_j} \) where \( J \) is a finite index set, \( \{b_j: j \in J\} \) is a set of distinct real numbers, and \( \{B_j: j \in J\} \) is a collection of nonempty sets in \( \mathscr{S} \) that partition \( S \). This representation is known as the canonical representation.

- Note that \( f \) is measurable since \( A_i \in \mathscr{S} \) for each \( i \in I \). Also \( f \) has finite range since \( I \) is finite. Specifically, the range of \( f \) consists of the distinct \( a_i \) for \( i \in I \) with \( A_i \ne \emptyset \).
- Suppose that \( f \) is simple. Let \( \{b_j: j \in J\} \) denote the (distinct) values in the range of \( f \) and let \( B_j = f^{-1}\{b_j\} \) for \( j \in J \). Then \( J \) is finite, \( \{B_j: j \in J\} \) is a collection of nonempty sets in \( \mathscr{S} \) that partition \( S \), and \( f = \sum_{j \in J} b_j \bs{1}_{B_j} \). Conversely, suppose that \( f \) has a representation of this form. Then \(\{b_j: j \in J\}\) is the range of \( f \) and \( B_j = f^{-1}\{b_j\} \) so the representation is unique.

You might wonder why we don't just always use the canonical representation for simple functions. The problem is that even if we start with canonical representations, when we combine simple functions in various ways, the resulting representations may not be canonical. The collection of simple functions is closed under the basic arithmetic operations, and in particular, forms a vector space.

Suppose that \( f \) and \( g \) are simple functions with representations \( f = \sum_{i \in I} a_i \bs{1}_{A_i} \) and \( g = \sum_{j \in J} b_j \bs{1}_{B_j} \), and that \( c \in \R \). Then

- \( f + g \) is simple, with representation \(f + g = \sum_{(i, j) \in I \times J} (a_i + b_j) \bs{1}_{A_i \cap B_j} \).
- \( f g \) is simple, with representation \(f g = \sum_{(i, j) \in I \times J} (a_i b_j) \bs{1}_{A_i \cap B_j} \).
- \( c f \) is simple, with representation \( c f = \sum_{i \in I} c a_i \bs{1}_{A_i} \).

Since \( f \) and \( g \) are measurable, so are \( f + g \), \( f g \), and \( c f \). Moreover, since \( f \) and \( g \) have finite range, so do \( f + g \), \( f g \), and \( c f \). For the representations in parts (a) and (b), note that \( I \times J \) is finite, \( \left\{A_i \cap B_j: (i, j) \in I \times J\right\} \) is a collection of sets in \( \mathscr{S} \) that partition \( S \), and on \( A_i \cap B_j \), \( f + g = a_i + b_j \) and \( f g = a_i b_j \).

As we alluded to earlier, note that even if the representations of \( f \) and \( g \) are canonical, the representations for \( f + g \) and \( f g \) may not be. The next result treats composition, and will be important for the change of variables theorem in the next section.

Suppose that \( (T, \mathscr{T}) \) is another measurable space, and that \( f: S \to T \) is measurable. If \( g \) is a simple function on \( T \) with representation \( g = \sum_{i \in I} b_i \bs{1}_{B_i} \), then \( g \circ f \) is a simple function on \( S \) with representation \(g \circ f = \sum_{i \in I} b_i \bs{1}_{f^{-1}(B_i)}\).

Recall that \( g \circ f : S \to \R \) and \( \range(g \circ f) \subseteq \range(g) \) so \( g \circ f \) has finite range. \( f \) is measurable, and inverse images preserve all set operations, so \( \left\{f^{-1}(B_i): i \in I\right\} \) is a measurable partition of \( S \). Finally, if \( x \in f^{-1}(B_i) \) then \( f(x) \in B_i \) so \( g\left[f(x)\right] = b_i \).

Given the definition of the integral of an indicator function and that we want the linearity property to hold, there is no question as to how we should define the integral of a nonnegative simple function.

Suppose that \( f \) is a nonnegative simple function, with the representation \( f = \sum_{i \in I} a_i \bs{1}_{A_i} \) where \( a_i \ge 0 \) for \( i \in I \). We define \[ \int_S f \, d\mu = \sum_{i \in I} a_i \mu(A_i) \]

*Consistency* refers to the fact that a simple function can have more than one representation as a linear combination of indicator functions, and hence we must show that all such representations lead to the same value for the integral. Let \(\{b_j: j \in J\}\) denote the set of distinct elements among the numbers \( a_i \) where \(i \in I\) and \(A_i \neq \emptyset \). For \( j \in J \), let \( I_j = \{i \in I: a_i = b_j\} \) and let \( B_j = \bigcup_{i \in I_j} A_i \). Thus, \( f = \sum_{j \in J} b_j \bs{1}_{B_j} \), and this is the canonical representation. Note that
\[ \sum_{i \in I} a_i \mu(A_i) = \sum_{j \in J} \sum_{i \in I_j} a_i \mu(A_i) = \sum_{j \in J} b_j \sum_{i \in I_j} \mu(A_i) = \sum_{j \in J} b_j \mu(B_j) \]
The first sum is the integral defined in terms of the general representation \( f = \sum_{i \in I} a_i \bs{1}_{A_i} \) while the last sum is the integral defined in terms of the unique canonical representation \( f = \sum_{j \in J} b_j \bs{1}_{B_j} \). Thus, any representation of a simple function \( f \) leads to the same value for the integral.

Note that if \( f \) is a nonnegative simple function, then \( \int_S f \, d\mu \) exists in \( [0, \infty] \), so the positive property holds. We next show that the linearity properties are satisfied for nonnegative simple functions.

Suppose that \( f \) and \( g \) are nonnegative simple functions, and that \( c \in [0, \infty) \). Then

- \( \int_S (f + g) \, d\mu = \int_S f \, d\mu + \int_S g \, d\mu \)
- \( \int_S c f \, d\mu = c \int_S f \, d\mu \)

Suppose that \( f \) and \( g \) are nonnegative simple functions with the representations \( f = \sum_{i \in I} a_i \bs{1}_{A_i} \) and \(g = \sum_{j \in J} b_j \bs{1}_{B_j} \). Thus \( a_i \ge 0 \) for \( i \in I \), \( b_j \ge 0 \) for \( j \in J \), and \(\int_S f \, d\mu = \sum_{i \in I} a_i \mu(A_i) \) and \( \int_S g \, d\mu = \sum_{j \in J} b_j \mu(B_j) \).

- As noted above, \( f + g \) has the representation \[ f + g = \sum_{(i, j) \in I \times J} (a_i + b_j) \bs{1}_{A_i \cap B_j} \] Note that \( \{A_i \cap B_j: j \in J\} \) is a partition of \( A_i \) for each \( i \in I \), and similarly \( \{A_i \cap B_j: i \in I\} \) is a partition of \( B_j \) for each \( j \in J \). Hence \begin{align} \int_S (f + g) \, d\mu & = \sum_{(i, j) \in I \times J} (a_i + b_j) \mu(A_i \cap B_j) \\ & = \sum_{i \in I} \sum_{j \in J} a_i \mu(A_i \cap B_j) + \sum_{j \in J} \sum_{i \in I} b_j \mu(A_i \cap B_j) \\ & = \sum_{i \in I} a_i \mu(A_i \cap B) + \sum_{j \in J} b_j \mu(B_j \cap A) \\ & = \sum_{i \in I} a_i \mu(A_i) + \sum_{j \in J} b_j \mu(B_j) = \int_S f \, d\mu + \int_S g \, d\mu \end{align} Note that all the terms are nonnegative (although some may be \( \infty \)), so there are no problems with rearranging the order of the terms.
- This part is easer. For \( c \in [0, \infty) \), recall that \( c f \) has the representation \( c f = \sum_{i \in I} c a_i \bs{1}_{A_i} \) so \[ \int_S c f \, d\mu = \sum_{i \in I} c a_i \mu(A_i) = c \sum_{i \in I} a_i \mu(A_i) = c \int_S f \, d\mu \]

The increasing property holds for nonnegative simple functions.

Suppose that \( f \) and \( g \) are nonnegative simple functions and \( f \le g \) on \( S \). Then \( \int_S f \, d\mu \le \int_S g \, d\mu \)

The proof from the additive property works. Note that \( g - f \) is a nonnegative simple function, and \( g = f + (g - f) \). By the additivity property, \( \int_S g \, d\mu = \int_S f \, d\mu + \int_S (g - f) \, d\mu \ge \int_S f \, d\mu \).

Next we give a version of the monotone convergence theorem for simple functions. It's not completely general, but will be needed for the next subsection where we do prove the general version.

Suppose that \( f \) is a nonnegative simple function and that \( (A_1, A_2, \ldots) \) is an increasing sequence of sets in \( \mathscr{S} \) with \( A = \bigcup_{n=1}^\infty A_n \). then \[ \int_S \bs{1}_{A_n} f \, d\mu \to \int_S \bs{1}_A f \, d\mu \text{ as } n \to \infty\]

Suppose that \( f \) has the representation \( f = \sum_{i \in I} b_i \bs{1}_{B_i}\). Then \( \bs{1}_{A_n} f = \sum_{i \in I} b_i \bs{1}_{A_n} \bs{1}_{B_i} = \sum_{i \in I} b_i \bs{1}_{A_n \cap B_i} \) and similarly, \( \bs{1}_A f = \sum_{i \in I} b_i \bs{1}_{A \cap B_i} \). But for each \( i \in I \), \( B_i \cap A_n \) is increasing in \( n \in \N_+ \) and \( \bigcup_{n=1}^\infty (B_i \cap A_n) = B_i \cap A \). By the continuity theorem for positive measures, \( \mu(B_i \cap A_n) \to \mu(B_i \cap A) \) as \( n \to \infty \) for each \( i \in I \). Since \( I \) is finite, \[ \int_{A_n} f \, d\mu = \sum_{i \in I} b_i \mu(A_n \cap B_i) \to \sum_{i \in I} b_i \mu(A \cap B_i) = \int_A f \, d\mu \text{ as } n \to \infty \]

Note that \( \bs{1}_{A_n} f \) is increasing in \( n \in \N_+ \) and \( \bs{1}_{A_n} f \to \bs{1}_A f \) as \( n \to \infty \), so this really is a special case of the monotone convergence theorem.

Next we will consider nonnegative measurable functions on \( S \). First we note that a function of this type is the limit of nonnegative simple functions.

Suppose that \( f: S \to [0, \infty) \) is measurable. Then there exists an increasing sequence \( \left(f_1, f_2, \ldots\right) \) of nonnegative simple functions with \( f_n \to f \) on \( S \) as \( n \to \infty \).

For \( n \in \N_+ \) and \( k \in \left\{1, 2, \ldots, n 2^n\right\} \) Let \( I_{n,k} = \left[(k -1) \big/ 2^n, k \big/ 2^n\right) \) and \( I_n = [n, \infty) \). Note that

- \( \left\{I_{n,k}: k = 1, \ldots, n 2^n\right\} \cup \left\{I_n\right\} \) is a partition of \( [0, \infty) \) for each \( n \in \N_+ \).
- \( I_{n, k} = I_{n + 1, 2 k - 1} \cup I_{n + 1, 2 k} \) for \( k \in \{1, 2, \ldots, n 2^n\} \).
- \(I_n = \left(\bigcup_{k = n 2^{n + 1} + 1}^{(n+1)2^{n+1}} I_{n+1,k} \right) \cup I_{n+1} \) for \( n \in \N_+ \).

Note that the \( n \)th partition divides the interval \( [0, n) \) into \( n 2^n \) subintervals of length \( 1 \big/ 2^n \). Thus, (b) follows because the \( (n + 1) \)st partition divides each of the first \( 2^n \) intervals of the \( n \)th partition in half, and (c) follows because the \( (n + 1) \)st partition divides the interval \( [n, n + 1) \) into subintervals of length \( 1 \big/ 2^{n + 1} \). Now let \( A_{n,k} = f^{-1}\left(I_{n,k}\right) \) and \( A_n = f^{-1}\left(I_n\right) \) for \( n \in \N_+ \) and \( k \in \left\{1, 2, \ldots, n 2^n\right\} \). Since inverse images preserve all set operations, (a), (b), and (c) hold with \( A \) replacing \( I \) everywhere, and \( S \) replacing \( [0, \infty) \) in (a). Moreover, since \( f \) is measurable, \( A_n \in \mathscr{S} \) and \( A_{n, k} \in \mathscr{S} \) for each \( n \) and \( k \). Now, define \[ f_n = \sum_{k = 1}^{ n 2^n} \frac{k - 1}{2^n} \bs{1}_{A_{n, k}} + n \bs{1}_{A_n} \] Then \( f_n \) is a simple function and \( 0 \le f_n \le f \) for each \( n \in \N_+ \). To show convergence, fix \( x \in S \). If \( n \gt f(x) \) then \( \left|f(x) - f_n(x)\right| \le 2^{-n} \) and hence \( f_n(x) \to f(x) \) as \( n \to \infty \). All that remains is to show that \( f_n \) is increasing in \( n \). Let \( x \in S \) and \( n \in \N_+ \). If \( x \in A_{n,k} \) for some \( k \in \left\{1, 2, \ldots, n 2^n\right\} \), then \( f_n(x) = (k - 1) \big/ 2^n \). But either \(f_{n + 1}(x) = (2 k - 2) \big/ 2^{n + 1} \) or \( f_{n + 1}(x) = (2 k - 1) \big/ 2^{n + 1} \). If \( x \in A_n \) then \( f_n(x) = n \). But either \( f_{n+1}(x) = (k - 1) \big/ 2^{n + 1} \) for some \( k \in \left\{n 2^{n+1} + 1, \ldots, (n + 1) 2^{n+1}\right\} \) or \( f_{n+1}(x) = n + 1 \). In all cases, \( f_{n + 1}(x) \ge f_n(x) \).

The last result points the way towards the definition of the integral of a measurable function \( f: S \to [0, \infty) \) in terms of the integrals of simple functions. If \( g \) is a nonnegative simple function with \( g \le f \), then by the order property, we need \( \int_S g \, d\mu \le \int_S f \, d\mu \). On the other hand, there exists a sequence of nonnegative simple function converging to \( f \). Thus the continuity property suggests the following definition:

If \( f: S \to [0, \infty) \) is measurable, we define \[ \int_S f \, d\mu = \sup\left\{ \int_S g \, d\mu: g \text{ is simple and } 0 \le g \le f \right\} \]

Note that \( \int_S f \, d\mu \) exists in \( [0, \infty] \) so the positive property holds. Note also that if \( f \) is simple, the new definition agrees with the old one. As always, we need to establish the essential properties. First, the increasing property holds.

If \( f, \, g: S \to [0, \infty) \) are measurable and \( f \le g \) on \( S \) then \( \int_S f \, d\mu \le \int_S g \, d\mu \).

Note that \( \{h: h \text{ is simple and } 0 \le h \le f\} \subseteq \{ h: h \text { is simple and } 0 \le h \le g\} \). therefore \[ \int_S f \, d\mu = \sup\left\{\int_S h \, d\mu: h \text{ is simple and } 0 \le h \le f \right\} \le \sup\left\{\int_S h \, d\mu: h \text{ is simple and } 0 \le h \le g\right\} = \int_S g \, d\mu \]

We can now prove the continuity property known as the monotone convergence theorem in full generality.

Suppose that \( f_n: S \to [0, \infty) \) is measurable for \( n \in \N_+ \) and that \( f_n \) is increasing in \( n \). Then \[ \int_S \lim_{n \to \infty} f_n \, d\mu = \lim_{n \to \infty} \int_S f_n d \mu \]

Let \( f = \lim_{n \to \infty} f_n \). By the order property, note that \( \int_S f_n \, d\mu \) is increasing in \( n \in \N_+\) and hence has a limit in \( \R^* \), which we will denote by \( c \). Note that \( f_n \le f \) on \( S \) for \( n \in \N_+ \), so by the order property again, \(\int_S f_n \, d\mu \le \int_S f \, d\mu\) for \( n \in \N_+ \). Letting \( n \to \infty \) gives \( c \le \int_S f \, d\mu \). To show that \( c \ge \int_S f \, d\mu \) we need to show that \( c \ge \int_S g \, d\mu \) for every simple function \( g \) with \( 0 \le g \le f \). Fix \( a \in (0, 1) \) and let \( A_n = \{ x \in S: f_n(x) \ge a g(x)\} \). Since \( f_n \) is increasing in \( n \), \( A_n \subseteq A_{n+1} \). Moreover, since \( f_n \to f \) as \( n \to \infty \) on \( S \) and \( g \le f \) on \( S \), \( \bigcup_{n=1}^\infty A_n = S \). But by definition, \( \alpha g \le f_n \) on \( A_n \) so \[ \alpha \int_S \bs{1}_{A_n} g \, d\mu = \int_S \alpha \bs{1}_{A_n} g \, d\mu \le \int_S \bs{1}_{A_n} f_n \, d\mu \le \int_S f_n \, d\mu \] Letting \( n \to \infty \) in the extreme parts of the displayed inequality and using the version of the monotone convergence theorem for simple functions above, we have \( a \int_S g \, d\mu \le c \) for every \( a \in (0, 1) \). Finally, letting \( a \uparrow 1 \) gives \( \int_S g \, d\mu \le c \)

If \( f: S \to [0, \infty) \) is measurable, then by there exists an increasing sequence \( \left(f_1, f_2, \ldots\right) \) of simple functions with \( f_n \to f \) as \( n \to \infty \). By the monotone convergence theorem, \( \int_S f_n \, d\mu \to \int_S f \, d\mu \) as \( n \to \infty \). These two facts can be used to establish other properties of the integral of a nonnegative function based on our knowledge that the properties hold for simple functions. This type of argument is known as bootstrapping. We use bootstrapping to show that the linearity properties hold:

If \( f, \, g: S \to [0, \infty) \) are measurable and \( c \in [0, \infty) \), then

- \( \int_S (f + g) \, d\mu = \int_S f \, d\mu + \int_S g \, d\mu \)
- \( \int_S c f \, d\mu = c \int_S f \, d\mu \)

- Let \( \left(f_1, f_2, \ldots\right) \) and \( \left(g_1, g_2, \ldots\right) \) be increasing sequences of nonnegative simple functions with \( f_n \to f \) and \( g_n \to g \) as \( n \to \infty \). Then \( (f_1 + g_1, f_2 + g_2, \ldots) \) is also an increasing sequence of simple functions, and \( f_n + g_n \to f + g \) as \( n \to \infty \). By the monotone convergence theorem, \( \int_S f_n \, d\mu \to \int_S f \, d\mu \), \( \int_S g_n \, d\mu \to \int_S g \, d\mu \), and \( \int_S (f_n + g_n) \, d\mu \to \int_S (f + g) \, d\mu \) as \( n \to \infty \). But \( \int_S (f_n + g_n) \, d\mu = \int_S f_n \, d\mu + \int_S g_n \, d\mu \) for each \( n \in \N_+ \) so taking limits gives \( \int_S (f + g) \, d\mu = \int_S f \, d\mu + \int_S g \, d\mu \).
- Similarly, \( (c f_1, c f_2, \ldots) \) is an increasing sequence of nonnegative simple functions with \( c f_n \to c f \) as \( n \to \infty \). Again, by the MCT, \( \int_S f_n \, d\mu \to \int_S f \, d\mu \) and \( \int_S c f_n \, d\mu \to \int_S c f \, d\mu \) as \( n \to \infty \). But \( \int_S c f_n \, d\mu = c \int_S f_n \, d\mu \) so taking limits gives \( \int_S c f \, d\mu = c \int_S f \, d\mu \).

Our final step is to define the integral of a measurable function \( f: S \to \R \). First, recall the positive and negative parts of \( x \in \R \): \[ x^+ = \max\{x, 0\}, \; x^- = \max\{-x, 0\} \] Note that \( x^+ \ge 0 \), \( x^- \ge 0 \), \( x = x^+ - x^- \), and \( \left|x\right| = x^+ + x^- \). Given that we want the integral to have the linearity properties, there is no question as to how we should define the integral of \( f \) in terms of the integrals of \( f^+ \) and \( f^- \), which being nonnegative, are defined by the previous subsection.

If \( f: S \to \R \) is measurable, we define \[ \int_S f \, d\mu = \int_S f^+ \, d\mu - \int_S f^- \, d\mu \] assuming that at least one of the integrals on the right is finite. If both are finite, then \( f \) is said to be integrable.

Assuming that either the integral of the positive part or the integral of the negative part is finite ensures that we do not get the dreaded indeterminate form \( \infty - \infty \).

Suppose that \( f: S \to \R \) is measurable. Then \( f \) is integrable if and only if \( \int_S \left|f \right| \, d\mu \lt \infty \).

Suppose that \( f \) is integrable. Recall that \( \left| f \right| = f^+ + f^- \). By the additive property for nonnegative functions, \( \int_S \left| f \right| \, d\mu = \int_S f^+ \, d\mu + \int_S f^- \, d\mu \lt \infty \). Conversely, suppose that \( \int_S \left| f \right| \, d\mu \lt \infty \). Then \( f^+ \le \left| f \right| \) and \( f^- \le \left| f \right|\) so by the increasing property for nonnegative functions, \( \int_S f^+ \, d\mu \le \int_S \left| f \right| \, d\mu \lt \infty \) and \( \int_S f^- \, d\mu \le \int_S \left| f \right| \, d\mu \lt \infty \).

Note that if \( f \) is nonnegative, then our new definition agrees with our old one, since \( f^+ = f \) and \( f^- = 0 \). For simple functions the integral has the same basic form as for nonnegative simple functions:

Suppose that \( f \) is a simple function with the representation \( f = \sum_{i \in I} a_i \bs{1}_{A_i} \). Then \[ \int_S f \, d\mu = \sum_{i \in I} a_i \mu(A_i) \] assuming that the sum does not have both \( \infty \) and \( -\infty \) terms.

Note that \( f^+ \) and \( f^- \) are also simple, with the representations \( f^+ = \sum_{i \in I} a_i^+ \bs{1}_{A_i} \) and \( f^- = \sum_{i \in I} a_i^- \bs{1}_{A_i} \). hence \[ \int_S f \, d\mu = \sum_{i \in I} a_i^+ \mu(A_i) - \sum_{i \in I} a_i^- \mu(A_i) \] as long as one of the sums is finite. Given that this is the case, we can recombine the sums to get \[ \int_S f \, d\mu = \sum_{i \in I} a_i \mu(A_i) \]

Once again, we need to establish the essential properties. Our first result is an intermediate step towards linearity.

If \( f, \; g: S \to [0, \infty) \) are measurable then \( \int_S (f - g) \, d\mu = \int_S f \, d\mu - \int_S g \, d\mu \) as long as at least one of the integrals on the right is finite.

We take cases. Suppose first that \( \int_S f \, d\mu \lt \infty \) and \( \int_S g \, d\mu \lt \infty \). Note that \( (f - g)^+ \le f \) and \( (f - g)^- \le g \). By the increasing property for nonnegative functions, \( \int_S (f - g)^+ \, d\mu \le \int_S f \, d\mu \lt \infty \) and \( \int_S (f - g)^- \, d\mu \le \int_S g \, d\mu \lt \infty \). Thus \( f - g \) is integrable. Next we have \( f - g = (f - g)^+ - (f - g)^- \) and therefore \( f + (f - g)^- = g + (f - g)^+ \). All four of the functions in the last equation are nonnegative, and therefore by additivity property for nonnegative functions, we have \[ \int_S f \, d\mu + \int_S (f - g)^- \, d\mu = \int_S g \, d\mu + \int_S (f - g)^+ \, d\mu \] All of these integrals are finite, and hence \[\int_S (f - g) \, d\mu = \int_S (f - g)^+ \, d\mu - \int_S (f - g)^- \, d\mu = \int_S f \, d\mu - \int_S g \, d\mu\]

Next suppose that \( \int_S f \, d\mu = \infty \) and \( \int_S g \, d\mu \lt \infty \). Then \( f - g \le (f - g)^+ \) and hence \( f \le (f - g)^+ + g \). Using the additivity and increasing properties for nonnegative functions, we have \( \infty = \int_S f \, d\mu \le \int_S (f - g)^+ \, d\mu + \int_S g \, d\mu\). Since \( \int_S g \, d\mu \lt \infty \) we must have \( \int_S (f - g)^+ \, d\mu = \infty \). On the other hand, \( (f - g)^- \le g \) so \( \int_S (f - g)^- \, d\mu \le \int_S g \, d\mu \lt \infty \). Hence \( \int_S (f - g) \, d\mu = \infty = \int_S f \, d\mu - \int_S g \, d\mu \)

Finally, suppose that \( \int_S f \, d\mu \lt \infty \) and \( \int_S g \, d\mu = \infty \). By the argument in the last paragraph, we have \( \int_S (g - f)^+ \, d\mu = \infty \) and \( \int_S (g - f)^- \, d\mu \lt \infty \). Equivalently, \( \int_S (f - g)^+ \, d\mu \lt \infty \) and \( \int_S (f - g)^- \, d\mu = \infty \). Hence \( \int_S (f - g) \, d\mu = -\infty = \int_S f \, d\mu - \int_S g \, d\mu \).

We finally have the linearity properties in full generality.

If \( f, \; g: S \to \R \) are measurable functions whose integrals exist, and \( c \in \R \), then

- \( \int_S (f + g) \, d\mu = \int_S f \, d\mu + \int_S g \, d\mu \) as long as the right side is not of the form \( \infty - \infty \).
- \( \int_S c f \, d \mu = c \int_S f \, d\mu \)

- Note that \( f + g = (f^+ - f^-) + (g^+ - g^-) = (f^+ + g^+) - (f^- + g^-) \) and the two functions in parentheses in the last expression are nonnegative. By the previous lemma and the additivity property for nonnegative functions, we have \begin{align} \int_S (f + g) \, d\mu & = \int_S (f^+ + g^+) \, d\mu - \int_S (f^- + g^-) \, d\mu \\ & = \left(\int_S f^+ \, d\mu + \int_S g^+ \, d\mu\right) - \left(\int_S f^- \, d\mu + \int_S g^- \, d\mu \right) \end{align} assuming that either both integrals in the first parentheses are finite or both integrals in the second parentheses are finite. In either case, we can group the terms (without worrying about the dreaded \( \infty - \infty \)) to get \[ \int_S (f + g) \, d\mu = \left(\int_S f^+ \, d\mu - \int_S f^- \, d\mu \right) + \left(\int_S g^+ \, d\mu - \int_S g^- \, d\mu \right) = \int_S f \, d\mu + \int_S g \, d\mu \]
- Note that if \( c \ge 0 \) then \( (c f)^+ = c f^+ \) and \( (c f)^- = c f^- \). Hence using the scaling property for nonnegative functions, \[ \int_S c f \, d\mu = \int_S (c f)^+ \, d\mu - \int_S (c f)^- \, d\mu = \int_S c f^+ \, d\mu - \int _S c f^- \, d\mu = c \int_S f^+ \, d\mu - c \int_S f^- \, d\mu = c \int_S f \, d\mu \] On the other hand, if \( c \lt 0 \), \( (c f)^+ = - c f^- \) and \( (c f)^- = - c f^+ \). Again using the scaling property for nonnegative functions, \[ \int_S c f \, d\mu = \int_S (c f)^+ \, d\mu - \int_S (c f)^- \, d\mu = \int_S - c f^- \, d\mu - \int _S -c f^+ \, d\mu = - c \int_S f^- \, d\mu + c \int_S f^+ \, d\mu = c \int_S f \, d\mu \]

In particular, note that if \( f \) and \( g \) are integrable, then so are \( f + g \) and \( c f \) for \( c \in \R \). Thus, the set of integrable functions on \( (S, \mathscr{S}, \mu) \) forms a vector space, which is denoted \( \mathscr{L}(S, \mathscr{S}, \mu) \). The \( \mathscr{L} \) is in honor of Henri Lebesgue, who first developed the theory. This vector space, and other related ones, will be studied in more detail in the section on function spces.

We also have the increasing property in full generality.

If \( f, \; g: S \to \R \) are measurable functions whose integrals exist, and if \( f \le g \) on \( S \) then \( \int_S f \, d\mu \le \int_S g \, d\mu \)

We can use the proof based on the additivity property. First \( g = f + (g - f) \) and \( g - f \ge 0 \) on \( S \). If \( \int_S f \, d\mu = -\infty \) then trivially, \( \int_S f \, d\mu \le \int_S g \, d\mu \). Otherwise \( \int_S (g - f) \, d\mu \ge 0 \) and therefore \( \int_S g \, d\mu = \int_S f \, d\mu + \int_S (g - f) \, d\mu \ge \int_S f \, d\mu \).

Now that we have defined the integral of a measurable function \( f \) over all of \( S \), there is a natural extension to the integral of \( f \) over a measurable subset

If \( f: S \to \R \) is measurable and \( A \in \mathscr{S} \), we define \[ \int_A f \, d\mu = \int_S \bs{1}_A f \, d\mu \] assuming that the integral on the right exists.

If \( f: S \to \R \) is a measurable function whose integral exists and \( A \in \mathscr{S} \), then the integral of \( f \) over \( A \) exists.

Note that \( \left(\bs{1}_A f\right)^+ = \bs{1}_A f^+ \) and \( \left(\bs{1}_A f\right)^- = \bs{1}_A f^- \). Also \( \bs{1}_A f^+ \le f^+ \) and \( \bs{1}_A f^- \le f^- \). If \( \int_S f \, d\mu \) exists, then either \( \int_S f^+ \, d\mu \lt \infty \) or \( \int_S f^- \, d\mu \lt \infty \). By the increasing property, it follows that either \( \int_S \bs{1}_A f^+ \, d\mu \lt \infty \) or \( \int_S \bs{1}_A f^- \, d\mu \lt \infty \), so \( \int_A f \, d\mu \) exists.

On the other hand, it's clearly possible for \( \int_A f \, d\mu \) to exist for some \( A \in \mathscr{S} \), but not \( \int_S f \, d\mu \).

We could also simply think of \( \int_A f \, d\mu \) as the integral of a measurable function \( f: A \to \R \) over the measure space \( (A, \mathscr{S}_A, \mu_A) \), where \( \mathscr{S}_A = \{ B \in \mathscr{S}: B \subseteq A\} = \{C \cap A: C \in \mathscr{S}\} \) is the \( \sigma \)-algebra of measurable subsets of \( A \), and where \( \mu_A \) is the restriction of \( \mu \) to \( \mathscr{S}_A \). It follows that all of the essential properties hold for integrals over \( A \): the linearity properties, the order properties, and the monotone convergence theorem. The following property is a simple consequence of the general additive property, and is known as additive property for disjoint domains.

Suppose that \( f: S \to \R \) is a measurable function whose integral exists, and that \( A, \; B \in \mathscr{S} \) are disjoint. then \[ \int_{A \cup B} f \, d\mu = \int_A f \, d\mu + \int_B f \, d\mu \]

Recall that \( \bs{1}_{A \cup B} = \bs{1}_A + \bs{1}_B \). Hence by the additive property and the previous result, \[ \int_{A \cup B} f \, d\mu = \int_S \bs{1}_{A \cup B} f \, d\mu = \int_S \left(\bs{1}_A f + \bs{1}_B f\right) \, d\mu = \int_S \bs{1}_A f \, d\mu + \int_S \bs{1}_B f \, d\mu = \int_A f \, d\mu + \int_B f \, d\mu \]

By induction, the additive property holds for a finite collection of disjoint domains. The extension to a countably infinite collection of disjoint domains will be considered in the next section on properties of the integral.

Suppose that \( S \) is a countable set, \( \mathscr{S} = \mathscr{P}(S)\) is the power set of \( S \), and \( \# \) is counting measure on \( \mathscr{S} \). Thus all functions \( f: S \to \R \) are measurable, and and as we will see, integrals with respect to \( \# \) are simply sums.

Suppose that \( S \) is finite. If \( f: S \to \R \) then \[ \int_S f \, d\# = \sum_{x \in S} f(x) \]

Note that every function \( f: S \to \R \) is simple and has the representation \( f = \sum_{x \in S} f(x) \bs{1}_x \) where \( \bs{1}_x \) is an abbreviation of \( \bs{1}_{\{x\}} \). Thus the result follows from the definition of the integral.

Suppose now that \( S \) is countably infinite.

If \( f: S \to [0, \infty) \) then \[ \int_S f \, d\# = \sum_{x \in S} f(x) \]

Let \( (A_1, A_2, \ldots) \) be an increasing sequence of finite subsets of \( S \) with \( \bigcup_{i=1}^\infty A_i = S \). Define \( f_n = \sum_{x \in A_n} f(x) \bs{1}_x \). Then \( (f_1, f_2, \ldots) \) is an increasing sequence of simple functions with \( f_n \to f \) as \( n \to \infty \). Thus \[ \int_S f \, d\# = \lim_{n \to \infty} \int_S f_n \, d\# = \lim_{n \to \infty} \sum_{x \in A_n} f(x) \] But by definition, the last limit on the right is just \( \sum_{x \in S} f(x) \).

If \( f: S \to \R \) then \[ \int_S f \, d\# = \sum_{x \in S} f(x) \] as long as either the sum of the positive terms or the sum of the negative terms in finite.

This follows from the definition of the integral as \( \int_S f \, d\# = \int_S f^+ \, d\# - \int_S f^- \, d\# \) as long as one of the integrals on the right is finite. By our previous result, \( \int_S f^+ \, d\# \) is the sum of the positive terms and \( -\int_S f^- \, d\# \) is the sum of the negative terms.

In the context of the previous result, if the sum of the positive terms and the sum of the negative terms are both finite, then \( f \) is integrable with respect to \( \# \), but the usual term from calculus is that the series \( \sum_{x \in S} f(x) \) is absolutely convergent.

All of this will look more familiar in the special case \( S = \N_+ \). Functions on \( S \) are simply sequences, so we can use the more familiar notation \( a_i \) rather than \( a(i) \) for a function \( a: S \to \R \). The proof of the result above with nonnegative terms (with \( A_n = \{1, 2, \ldots, n\} \)) is just the definition of an infinite series of nonnegative terms as the limit of the partial sums: \[ \sum_{i=1}^\infty a_i = \lim_{n \to \infty} \sum_{i=1}^n a_i \] The proof of the general result above is just the definition of a general infinite series \[ \sum_{i=1}^\infty a_i = \sum_{i=1}^\infty a_i^+ - \sum_{i=1}^\infty a_i^- \] as long as one of the series on the right is finite. Again, when both are finite, the series is absolutely convergent. In calculus we also consider conditionally convergent series. This means that \( \sum_{i=1}^\infty a_i^+ = \infty \), \( \sum_{i=1}^\infty a_i^- = \infty \), but \( \lim_{n \to \infty} \sum_{i=1}^n a_i \) exists in \( \R \). Such series have no place in general integration theory. Also, you may recall that such series are pathological in the sense that, given any number in \( \R^* \), there exists a rearrangement of the terms so that the rearranged series converges to the given number.

Consider the case of the measure space \( (\R, \mathscr{R}, \lambda) \) where \( \mathscr{R} \) is the usual \( \sigma \)-algebra of Lebesgue measurable sets and \( \lambda \) is Lebesgue measure. The theory developed above applies, of course, for the integral \( \int_A f \, d\mu \) of a measurable function \( f: \R \to \R \) over a set \( A \in \mathscr{R} \). It's not surprising that in this special case, the theory of integration is referred to as Lebesgue integration in honor of our good friend Henri Lebesgue, who first developed the theory.

On the other hand, we already have a theory of integration on \( \R \), namely the Riemann integral of calculus, named for our other good friend Georg Riemann. For a suitable function \( f \) and domain \( A \) this integral is denoted \( \int_A f(x) \, dx \), as we all remember from calculus. How are the two integrals related? As we will see, the Lebesgue integral generalizes the Riemann integral.

To understand the connection we need to review the definition of the Riemann integral. Consider first the standard case where the domain of integration is a closed, bounded interval. Here are the preliminary definitions that we will need.

Suppose that \( f: [a, b] \to \R \), where \( a, \; b \in \R \) and \( a \lt b \).

- A partition \( \mathscr{A} = \{A_i : i \in I\}\) of \( [a, b] \) is a finite collection of disjoint
*subintervals*whose union is \( [a, b] \). - The norm of a partition \( \mathscr{A} \) is \( \|A\| = \max\{\lambda(A_i): i \in I\} \), the length of the largest subinterval of \( \mathscr{A} \).
- A set of points \( B = \{x_i: i \in I\}\) where \( x_i \in A_i \) for each \( i \in I \) is said to be associated with the partition \( \mathscr{A} \).
- The Riemann sum of \( f \) corresponding to a partition \( \mathscr{A} \) and and a set \( B \) associated with \( \mathscr{A} \) is \[ R\left(f, \mathscr{A}, B\right) = \sum_{i \in I} f(x_i) \lambda(A_i) \]

Note that the Riemann sum is simply the integral of the simple function \( g = \sum_{i \in I} f(x_i) \bs{1}_{A_i} \). Moreover, since \( A_i \) is an interval for each \( i \in I \), \( g \) is a step function, since it is constant on a finite collection of disjoint intervals. Moreover, again since \( A_i \) is an interval for each \( i \in I \), \( \lambda(A_i) \) is simply the length of the subinterval \( A_i \), so of course measure theory per se is not needed for Riemann integration. Now for the definition from calculus:

\( f \) is Riemann integrable on \( [a, b] \) if there exists \( r \in \R \) with the property that for every \( \epsilon \gt 0 \) there exists \( \delta \gt 0 \) such that if \( \mathscr{A} \) is a partition of \( [a, b] \) with \( \|A\| \lt \delta \) then \( \left| r - R\left(f, \mathscr{A}, B\right) \right| \lt \epsilon \) for every set of points \( B \) associated with \( \mathscr{A} \). Then of course we define the integral by \[ \int_a^b f(x) \, dx = r\]

Here is our main theorem of this subsection.

If \( f: [a, b] \to \R \) is Riemann integrable on \( [a, b] \) then \( f \) is Lebesgue integrable on \( [a, b] \) and \[ \int_{[a, b]} f \, d\lambda = \int_a^b f(x) \, dx \]

On the other hand, there are lots of functions that are Lebesgue integrable but not Riemann integrable. In fact there are indicator functions of this type, the simplest of functions from the point of view of Lebesgue integration.

Consider the function \( \bs{1}_\Q \) where as usual, \( \Q \) is the set of rational number in \( \R \). Then

- \( \int_\R \bs{1}_\Q \, d\lambda = 0 \).
- \( \bs{1}_Q \) is not Riemann integrable on any interval \( [a, b] \) with \( a \lt b \).

Part (a) follows from the definition of the Lebesgue integral: \[ \int_\R \bs{1}_\Q \, d\lambda = \lambda(\Q) = 0 \] For part (b), note that there are rational and irrational numbers in every interval of \( \R \) of positive length (the rational numbers and the irrational numbers are dense in \( \R \)). Thus, given any partition \( \mathscr{A} = \{A_i: i \in I\} \) of \( [a, b] \), no matter how small the norm, there are Riemann sums that are 0 (take \(x_i \in A_i\) irrational for each \( i \in I \)), and Riemann sums that are \( b - a \) (take \( x_i \in A_i \) rational for each \( i \in I \))

The following fundamental theorem completes the picture.

\( f: [a, b] \to \R \) is Riemann integrable on \( [a, b] \) if and only if \( f \) is bounded on \( [a, b] \) and \( f \) is continuous almost everywhere on \( [a, b] \).

Now that the Riemann integral is defined for a closed bounded interval, it can be extended to other domains.

Extensions of the Riemann integral.

- If \( f \) is defined on \( [a, b) \) and Riemann integrable on \( [a, t] \) for \( a \lt t \lt b \), we define \( \int_a^b f(x) \, dx = \lim_{t \uparrow b} \int_a^t f(x) \, dx \) if the limit exists in \( \R^* \).
- If \( f \) is defined on \( (a, b] \) and Riemann integrable on \( [t, b] \) for \( a \lt t \lt b \), we define \( \int_a^b f(x) \, dx = \lim_{t \downarrow a} \int_t^b f(x) \, dx \) if the limit exists in \( \R^* \).
- If \( f \) is defined on \( (a, b) \), we select \( c \in (a, b) \) and define \( \int_a^b f(x) \, dx = \int_a^c f(x) \, dx + \int_c^b f(x), \, dx \) if the integrals on the right exist in \( \R^* \) by (a) and (b), and are not of the form \( \infty - \infty \).
- If \( f \) is defined an \( [a, \infty) \) and Riemann integrable on \( [a, t] \) for \( a \lt t \lt \infty \) we define \( \int_a^\infty f(x) \, dx = \lim_{t \to \infty} \int_a^t f(x) \, dx \).
- if \( f \) is defined on \( (-\infty, b] \) and Riemann integrable on \( [t, b] \) for \( -\infty \lt t \lt b \) we define \( \int_{-\infty}^b f(x) \, dx = \lim_{t \to -\infty} \int_t^b f(x) \, dx \) if the limit exists in \( \R^* \)
- if \( f \) is defined on \( \R \) we select \( c \in \R \) and define \( \int_{-\infty}^\infty f(x) \, dx = \int_{-\infty}^c f(x) \, dx + \int_c^\infty f(x) \, dx \) if both integrals on the right exist by (d) and (e), and are not of the form \( \infty - \infty \).
- The integral is be defined for a domain that is the union of a finite collection of disjoint intervals by the requirement that the integral be additive over disjoint domains

As another indication of its superiority, note that *none* of these convolutions is necessary for the Lebesgue integral. Once and for all, we have defined \( \int_A f(x) \, dx \) for a general measurable function \( f: \R \to \R \) and a general domain \( A \in \mathscr{R} \)

Consider again the measurable space \( (\R, \mathscr{R}) \) where \( \mathscr{R} \) is the usual \( \sigma \)-algebra of Lebesgue measurable subsets of \( \R \). Suppose that \( F: \R \to \R \) is a general distribution function, so that by definition, \( F \) is increasing and continuous from the right. Recall that the Lebesgue-Stieltjes measure \( \mu \) associated with \( F \) is the unique measure on \( \mathscr{R} \) that satisfies \[ \mu(a, b] = F(b) - F(a); \quad a, \, b \in \R, \; a \lt b \] Recall that \( F \) satisfies some, but not necessarily all of the properties of a probability distribution function. The properties not necessarily satisfied are the normalizing properties

- \( F(x) \to 0 \) as \( x \to -\infty \)
- \( F(x) \to 1 \) as \( x \to \infty \)

If \( F \) does satisfy these two additional properties, then \( \mu \) is a probability measure and \( F \) its probability distribution function.

The integral with respect to the measure \( \mu \) is, appropriately enough, referred to as the Lebesgue-Stieltjes integral with respect to \( F \), and like the measure, is named for the ubiquitous Henri Lebesgue and for Thomas Stieltjes. In addition to our usual notation \( \int_S f \, d\mu \), the Lebesgue-Stieltjes integral is also denoted \( \int_S f \, dF\) and \(\int_S f(x) \, dF(x) \).

Suppose that \( (S, \mathscr{S}, \P) \) is a probability space, so that \( S \) is the sample space of a random experiment, \( \mathscr{S} \) is the \( \sigma \)-algebra of events, and \( \P \) the probability measure. A measurable, real-valued function \( X \) on \( S \) is, of course, a real-valued random variable. The integral with respect to \( \P \), if it exists, is the expected value of \( X \) and is denoted \[ \E(X) = \int_S X \, d\P \] This concept is of fundamental importance in probability theory and is studied in detail in a separate chapter on Expected Value, mostly from an elementary point of view that does not involve abstract integration. However an advanced section treats expected value as an integral over the underlying probability measure, as above.

Suppose now that \( X \) is a general random variable for the experiment, taking values in a countable set \( T \), and hence with a discrete distribution. Recall that the probability density function \( f \) of \( X \) is given by \( f(x) = \P(X = x) \) for \( x \in T \). If \( A \subseteq T \), then
\[ \P(X \in A) = \sum_{x \in A} f(x) = \int_A f \, d\# \]
On the other hand, if \( X \) takes values in a measurable set \( T \subseteq \R^n \) with \( \lambda_n(T) \gt 0 \) and \( X \) has a continuous distribution, then \( f: T \to [0, \infty) \) is a probability density function of \( X \) if for every measurable \( A \subseteq T \)
\[ \P(X \in A) = \int_A f \, d\lambda_n \]
Technically, \( f \) is the density function of \( X \) with respect to counting measure \( \# \) in the discrete case, and \( f \) is the density function of \( X \) with respect to Lebesgue measure \( \lambda_n \) in the continuous case. In both cases, the probability of an event \( A \) is computed by integrating the density function, with respect to the appropriate measure, over \( A \). There are still differences, however. In the discrete case, the existence of the density function with respect to counting measure is guaranteed, and indeed we have an explicit formula for it. In the continuous case, the existence of a density function with respect to Lebesgue measure is not guaranteed, and indeed there might not be one. More generally, suppose that \( X \) takes values in a general measure space \( (T, \mathscr{T}, \mu) \). A measurable function \( f: T \to [0, \infty) \) is a probability density function of \( X \) (or more precisely, the *distribution* of \( X \)) with respect to \( \mu \) if
\[ \P(X \in A) = \int_A f \, d\mu, \quad A \in \mathscr{T} \]
This fundamental question of the existence of a density function will be clarified in the section on absolute continuity and density functions.

Suppose again that \( X \) is a real-valued random variable with distribution function \( F \). Then, by definition, the distribution of \( X \) is the Lebesgue-Stieltjes measure associated with \( F \): \[ \P(a \lt X \le b) = F(b) - F(a), \quad a, \; b \in \R, \; a \lt b \] regardless of whether the distribution is discrete, continuous, or mixed. Trivially, \( \P(X \in A) = \int_S \bs{1}_A \, dF \) for measurable \( A \subseteq \R \) and the expected value of \( X \) defined above can also be written as \( \E(X) = \int_\R x \, dF(x) \). Again, all of this will be explained in much more detail in the next chapter on Expected Value.

Let \( g(x) = \frac{1}{1 + x^2} \) for \( x \in \R \).

- Find \( \int_{-\infty}^\infty g(x) \, dx \).
- Show that \( \int_{-\infty}^\infty x g(x) \, dx \) does not exist.

- \(\int_{-\infty}^\infty g(x) \, dx = \pi \)
- \( \int_0^\infty x g(x) \, dx = \infty \), \( \int_{-\infty}^0 x g(x) \, dx = -\infty \)

You may recall that the function \( g \) in the last exercise is important in the study of the Cauchy distribution, named for Augustin Cauchy. You may also remember that the graph of \( g \) is known as the witch of Agnesi, named for Maria Agnesi.

Let \( g(x) = \frac{1}{x^b} \) for \( x \in [1, \infty) \) where \( b \gt 0 \) is a parameter. Find \( \int_1^\infty g(x) \, dx \)

\(\int_1^\infty g(x) \, dx = \begin{cases} \infty, & 0 \lt b \le 1 \\ \frac{1}{b - 1}, & b \gt 1 \end{cases} \)

You may recall that the function \( g \) in the last exercise is important in the study of the Pareto distribution, named for Vilfredo Pareto.

Suppose that \( f(x) = 0 \) if \( x \in \Q \) and \( f(x) = \sin(x) \) if \( x \in \R - \Q \).

- Find \( \int_{[0, \pi]} f(x) \, d\lambda(x) \)
- Does \( \int_0^\pi f(x) \, dx\) exist?

- 2
- No