\( \renewcommand{\P}{\mathbb{P}} \) \( \newcommand{\E}{\mathbb{E}} \) \( \newcommand{\R}{\mathbb{R}} \) \( \newcommand{\Q}{\mathbb{Q}} \) \( \newcommand{\N}{\mathbb{N}} \) \( \newcommand{\bs}{\boldsymbol} \) \( \newcommand{\range}{\text{range}} \)
  1. Random
  2. 2. Distributions
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7
  10. 8
  11. 9
  12. 10
  13. 11
  14. 12
  15. 13
  16. 14

13. Function Spaces

Basic Theory

Our starting point is a measure space \( (S, \mathscr{S}, \mu) \). That is \( S \) is a set, \( \mathscr{S} \) is a \( \sigma \)-algebra of subsets of \( S \), and \( \mu \) is a positive measure on \( (S, \mathscr{S}) \). As usual, the most important special cases are

In previous sections, we defined the integral of certain measurable functions \( f: S \to \R \) with respect to \( \mu \), and we studied properties of the integral. In this section, we will study vector spaces of functions that are defined in terms of certain integrability conditions. These function spaces are of fundamental importance in all areas of analysis, including probability. In particular, the results of this section will reappear in the form of spaces of random variables in our study of expected value.

Definitions and Basic Properties

Consider a statement on the elements of \(S \), for example an equation or an inequality with \( x \in S \) as a free variable. (Technically such a statement is a predicate on \( S \).) For \( A \in \mathscr{S} \), we say that the statement holds on \( A \) if it is true for every \( x \in A \). We say that the statement holds almost everywhere on \( A \) (with respect to \( \mu \)) if there exists \( B \in \mathscr{S} \) with \( B \subseteq A \) such that the statement holds on \( B \) and \( \mu(A \setminus B) = 0 \).

Measurable function \( f, \; g: S \to \R \) are equivalent if \( f = g \) almost everywhere on \( S \), in which case we write \( f \equiv g \). The relation \( \equiv \) is an equivalence relation; that is, if \( f, \; g, \; h: S \to \R \) are measurable then

  1. \( f \equiv f \), the reflexive property
  2. If \( f \equiv g \) then \( g \equiv f \), the symmetric property
  3. If \( f \equiv g \) and \( g \equiv h \) then \( f \equiv h \), the transitive property

Thus, equivalent functions are indistinguishable from the point of view of the measure \( \mu \). As with any equivalence relation, \( \equiv \) partitions the space (in this case the space of real-valued measurable functions on \( S \)) into equivalence classes of mutually equivalent elements. As we will see, we often view these equivalence classes as the basic objects of study. Our next task is to define measures of the size of a function; these will become norms in our spaces.

Suppose that \( f: S \to \R \) is measurable. For \( p \in (0, \infty) \) we define \[ \|f\|_p = \left(\int_S \left|f\right|^p \, d\mu\right)^{1/p} \] We also define \( \|f\|_\infty = \inf\left\{M \in [0, \infty]: \left|f\right| \le M \text{ almost everywhere on } S \right\}\).

Since \( \left|f\right|^p \) is a nonnegative, measurable function for \( p \in (0, \infty) \), \( \int_S \left|f\right|^p \, d\mu \) exists in \( [0, \infty] \), and hence so does \( \|f\|_p \). Clearly \( \|f\|_\infty \) also exists in \( [0, \infty] \) and is known as the essential supremum of \( f \). A number \( M \in [0, \infty] \) such that \( \left|f\right| \le M \) almost everywhere on \( S \) is an essential bound of \( f \) and so, appropriately enough, the essential supremum of \( f \) is the infimum of the essential bounds of \( f \). Thus, we have defined \( \|f\|_p \) for all \( p \in [0, \infty] \). The definition for \( p = \infty \) is special, but we will see that it's the appropriate one.

Let \( L^p \) denote the collection of measurable functions \( f: S \to \R \) such \( \|f\|_p \lt \infty \), or equivalently, \( \left|f\right|^p \) is integrable.

The symbol \( L \) is in honor of Henri Lebesgue, who first developed the theory. If we want to indicate the dependence on the underlying measure space, we write \( L^p(S, \mathscr{S}, \mu) \). Of course, \( L^1 \) is simply the collection of functions that are integrable with respect to \( \mu \). Our goal is to study the spaces \( L^p \) for \( p \in (0, \infty] \). We start with some simple properties.

Suppose that \( f: S \to \R \) is measurable. Then

  1. \( \|f\|_p \ge 0 \)
  2. \( \|f\|_p = 0 \) if and only if \( f = 0 \) almost everywhere on \( S \), so that \( f \equiv 0 \).
Proof:
  1. This is obvious from the definitions.
  2. For \( p \in (0, \infty) \), this follows from properties of the integral that we already have. First of course, \(\int_S 0^p \, d\mu = \int_S 0 \, d\mu = 0\) so \( \|0\|_p = 0 \). Conversely if \( \|f\|_p = 0 \) then \( \int_S \left|f\right|^p \, d\mu = 0 \) and hence \( \left|f\right|^p = 0 \) almost everywhere on \( S \) and so \( f = 0 \) almost everywhere on \( S \). Suppose \( p = \infty \). Clearly \( \|0\|_\infty = 0 \). Conversely suppose that \( \|f\|_\infty = 0 \). Then for each \( n \in \N_+ \) there exists \( M_n \in [0, \infty) \) with \( M_n \to 0 \) as \( n \to \infty \) and \( \left|f\right| \le M_n \) almost everywhere on \( S \). Hence \( f = 0 \) almost everywhere on \( S \).

Suppose that \( f: S \to \R \) is measurable and \( c \in \R \). Then \( \|c f\|_p = c \|f\|_p \)

Proof:

Again, when \( p \in (0, \infty) \), this result follow easily from properties of the integral that we already have: \[ \int_S \left|c f\right|^p \, d\mu = \left|c\right|^p \int_S \left|f\right|^p \, d\mu \] Taking the \( p \)th root of both sides gives the result. For \( p = \infty \), note that \( M \) is an essential bound of \( \left|f\right| \) if and only if \( \left|c\right| M \) is an essential bound if \( \left|c f \right| \).

In particular, if \( f \in L^p \) and \( c \in \R \) then \( c f \in L^p \).

Conjugate Indices and Hölder's inequality

Certain pairs of our function spaces turn out to be dual or complimentary to one another in a sense. To understand this, we need the following definition.

Indices \( p, \, q \in (1, \infty) \) are said to be conjugate if \( 1/p + 1/q = 1 \). In addition, \( 1 \) and \( \infty \) are conjugate indices.

For justification of the last case, note that if \( p \in (1, \infty) \), then the index conjugate to \( p \) is \[ q = \frac{1}{1 - 1/p} \] and \( q \uparrow \infty \) as \( p \downarrow 1 \). Note that \( p = q = 2 \) are conjugate indices, and this is the only case where the indices are the same. Ultimately, the importance of conjugate indices stems from the following inequality:

If \( x, \, y \in (0, \infty) \) and if \( p, \, q \in (1, \infty) \) are conjugate indices, then \[ x y \le \frac{1}{p} x^p + \frac{1}{q} y^q \] Moreover, equality occurs if and only if \( x^p = y^q \).

Proof 1:

From properties of the natural logarithm function, \[ \ln(x y) = \ln(x) + \ln(y) = \frac{1}{p}\ln\left(x^p\right) + \frac{1}{q} \ln\left(y^q\right) \] But the natural logarithm function is concave and \( 1/p + 1/q = 1 \) so \[ \ln(x y) = \frac{1}{p}\ln\left(x^p\right) + \frac{1}{q}\ln\left(y^q\right) \le \ln\left(\frac{1}{p} x^p + \frac{1}{q} y^q\right) \] Taking exponentials we have \[ x y \le \frac{1}{p} x^p + \frac{1}{q} y^q \]

Proof 2:

Fix \( y \in (0, \infty) \) and define \( f: (0, \infty) \to \R\) by \[ f(x) = \frac{1}{p} x^p + \frac{1}{q} y^q - x y, \quad x \in (0, \infty) \] Then \( f^\prime(x) = x^{p-1} - y\) and \( f^{\prime\prime}(x) = (p - 1) x^{p-2} \) for \( x \in (0, \infty) \). Hence \( f \) has a single critical point at \( x = y^{1/(p-1)} = y^{q/p} \) and \( f^{\prime\prime}(x) \gt 0 \) for \( x \in (0, \infty) \). It follows that the minimum value of \( f \) on \( (0, \infty) \) occurs at \( y^{q/p} \) and \(f\left(y^{q/p}\right) = 0\). Hence \( f(x) \ge 0 \) for \( x \in (0, \infty) \) with equality only at \( x = y^{q/p} \) (that is, \( x^p = y^q \)).

our next major result is Hölder's inequality, named for Otto Hölder, which clearly indicates the importance of conjugate indices.

Suppose that \( f, \, g: S \to \R \) are measurable and that \( p \) and \( q \) are conjugate indices. Then \[ \|f g\|_1 \le \|f\|_p \|g\|_q \]

Proof:

The result is obvious if \( \|f\|_p = \infty \) or \( \|g\|_q = \infty \), so suppose that \( f \in L^p \) and \( g \in L^q \). For our first case, suppose that \( p = 1 \) and \( q = \infty \). Note that \( \left|g\right| \le \|g\|_\infty \) almost everywhere on \( S \). Hence \[ \int_S \left|fg\right| \, d\mu = \int_S \left|f\right| \left|g\right| \, d\mu \le \|g\|_\infty \int_S \left|f\right| \, d\mu = \|f\|_1 \|g\|_\infty \] For the second case, suppose \( p, \; q \in (1, \infty) \). By the positive property above, the result holds if \( \|f\|_p = 0 \) or \( \|g\|_q = 0 \), so assume that \( \|f\|_p \gt 0 \) and \( \|g\|_q \gt 0 \). By the additivity of the integral over disjoint domains, we can restrict the integrals to the set \( \{x \in S: f(x) \ne 0, g(x) \ne 0\} \), or simply assume that \( f \ne 0 \) and \( g \ne 0 \) on \( S \). From the basic inequality above, \[ \left|f g\right| \le \frac{1}{p} \left|f\right|^p + \frac{1}{q} \left|g\right|^q \] Suppose first that \( \|f\|_p = \|g\|_q = 1 \). From the increasing and linearity properties of the integral, \[ \int_S \left|f g\right| \, d\mu \le \frac{1}{p} \int_S \left|f\right|^p \, d\mu + \frac{1}{q} \int_S \left|g\right|^q \, d\mu = \frac{1}{p} + \frac{1}{q} = 1 \] For the general case where \( \|f\|_p \gt 0 \) and \( \|g\|_q \gt 0 \), let \( f_1 = f \big/ \|f\|_p \) and \( g_1 = g \big/ \|g\|_q \). Then \( \left\|f_1\right\|_p = \left\|g_1\right\|_q = 1 \) so \( \left\|f_1 g_1 \right\| \le 1 \). So by the scaling property above and the last result, \[ \left\|f_1 g_1\right\| = \frac{\|f g\|_1}{\|f\|_p \|g\|_q} \le 1 \]

In particular, if \( f \in L^p \) and \( g \in L^q \) then \( f g \in L^1 \). The most important special case of Hölder's inequality is when \( p = q = 2 \), in which case we have the Cauchy-Schwartz inequality, named for Augustin Louis Cauchy and Karl Hermann Schwarz: \[ \|f g\|_1 \le \|f\|_2 \|g\|_2 \]

Minkowski's Inequality

Our next major result is Minkowski's inequality, named for Hermann Minkowski. This inequality will help show that \( L^p \) is a vector space and that \( \| \cdot \|_p \) is a norm (up to equivalence).

Suppose that \( f, \, g: S \to \R \) are measurable and that \( p \in [1, \infty] \). Then \[ \|f + g\|_p \le \|f\|_p + \|g\|_p \]

Proof:

Again, the result is trivial if \( \|f\|_p = \infty \) or \( \|g\|_p = \infty \), so assume that \( f, \, g \in L^p \). When \( p = 1 \), the result is the simple triangle inequality for the integral: \[ \|f + g\|_1 = \int_S \left|f + g\right| \, d\mu \le \int_S \left(\left|f\right| + \left|g\right|\right) \, d\mu = \int_S \left|f\right| \, d\mu + \int_S \left|g\right| \, d\mu = \|f\|_1 + \|g\|_1 \] For the case \( p = \infty \), note that if \( A \) is an essential bound for \( f \) and \( B \) is an essential bound for \( g \) then \( A + B \) is an essential bound for \( f + g \). Hence \( \|f + g\|_\infty \le \|f\|_\infty + \|g\|_\infty \). For the last case, suppose that \( p \in (1, \infty) \) and let \( q \) be the index conjugate to \( p \). Then \[ \left|f + g\right|^p = \left|f + g\right|^{p-1} \left|f + g\right| \le \left|f + g\right|^{p-1}\left(\left|f\right| + \left|g\right|\right) = \left|f + g\right|^{p-1}\left|f\right| + \left|f+g\right|^{p-1} \left|g\right| \] Integrating over \( S \) and using the additive and increasing properties of the integral gives \[ \|f + g\|_p^p \le \int_S \left|f + g\right|^{p-1} \left|f\right| \, d\mu + \int_S \left|f + g\right|^{p-1} \left|g\right| \, d\mu\] But by Höder's inequality, \[ \int_S \left|f + g\right|^{p-1} \left|f\right| \, d\mu \le \|\left|f+g\right|^{p-1}\|_q \|f\|_p, \; \int_S \left|f + g\right|^{p-1} \left|g\right| \, d\mu \le \|\left|f+g\right|^{p-1}\|_q \|g\|_p\] Combining this with the previous inequality we have \[ \|f+g\|_p^p \le \|\left|f + g\right|^{p-1}\|_q \left(\|f\|_p + \|g\|_p\right) \] But \( (p - 1) q = p \) and \( 1/q = (p - 1) / p \) so \[ \|\left|f+g\right|^{p-1}\|_q = \left(\int_S \left|f + g\right|^{(p-1)q} \, d\mu \right)^{1/q} = \left(\int_S \left|f + g\right|^p \, d\mu\right)^{(p - 1)/p} = \|f + g\|_p^{p-1}\] Hence we have \[ \|f+g\|_p^p \le \|f + g\|_p^{p-1} \left(\|f\|_p + \|g\|_p\right) \] and therefore \( \|f + g\|_p \le \|f\|_p + \|g\|_p \).

Vector Spaces

We can now discuss various vector spaces of functions. First, we know from our previous work with measure spaces, that the set \( \mathscr{V} \) of all measurable functions \( f: S \to \R \) is a vector space under our standard (pointwise) definitions of sum and scalar multiple. The spaces we are studying in this section are subspaces:

\( L^p \) is a subspace of \( \mathscr{V} \) for every \( p \in [1, \infty] \).

Proof:

We just need to show that \( L^p \) is closed under addition and scalar multiplication. From the scalar property above, if \( f \in L^p \) and \( c \in \R \) then \( c f \in L^p \). From Minkowski's inequality, if \( f, \, g \in L^p \) then \( f + g \in L^p \).

However, we usually want to identify functions that are equal almost everywhere on \( S \) (with respect to \( \mu \)). Here are the definitions:

Let \( [f] \) denote the equivalence class of \( f \in \mathscr{V} \) under the equivalence relation \( \equiv \) , and let \( \mathscr{U} = \left\{[f]: f \in \mathscr{V}\right\} \). If \( f, \, g \in \mathscr{V} \) and \( c \in \R \) we define

  1. \( [f] + [g] = [f + g]\)
  2. \(c [f] = [c f] \)

Then \( \mathscr{U} \) is a vector space.

Proof:

we know from our previous work that these definitions are consistent in the sense that they do not depend on the particular representatives of the equivalence classes. That is if \( f_1 \equiv f \) and \( g_1 \equiv g \) then \( f_1 + g_1 \equiv f + g \) and \( c f_1 \equiv c f \). That \( \mathscr{U} \) is a vector space then follows from the fact that \( \mathscr{V} \) is a vector space.

Now we can define the Lebesgue vector spaces precisely.

For \( p \in [1, \infty] \), let \( \mathscr{L}^p = \left\{[f]: f \in L^p\right\} \). For \( f \in \mathscr{V} \) define \( \left\|[f]\right\|_p = \|f\|_p \). Then \( \mathscr{L}^p \) is a subspace of \( \mathscr{U} \) and \( \| \cdot \|_p \) is a norm on \( \mathscr{L}^p \). That is, for \( f, g \in L^p \) and \( c \in \R \)

  1. \( \|f\|_p \ge 0 \) and \( \|f\|_p = 0 \) if and only if \( f \equiv 0 \), the positive property
  2. \( \| c f \|_p = \left|c\right| \|f\|_p \), the scaling property
  3. \( \|f + g\|_p \le \|f\|_p + \|g\|_p \), the triangle inequality
Proof:

That \( \mathscr{L}^p \) is a subspace of \( \mathscr{U} \) follows immediately from the fact that \( L^p \) is a subspace of \( \mathscr{V} \). The fact that \( \| \cdot \| \) is a norm on \( \mathscr{L}^p \) also follows from our previous work.

We have stated these results precisely, but on the other hand, we don't want to be overly pedantic. It's more natural and intuitive to simply work with the space \( \mathscr{V} \) and the subspaces \( L^p \) for \( p \in [1, \infty] \), and just remember that functions that are equal almost everywhere on \( S \) are regarded as the same vector. This will be our point of view for the rest of this section.

Every norm on a vector space naturally leads to a metric. That is, we measure the distance between vectors as the norm of their difference. Stated in terms of the norm \( \| \cdot \|_p \), here are the properties of the metric on \( L^p \).

For \( f, \, g, \, h \in L^p \),

  1. \( \|f - g\|_p \ge 0 \) and \( \|f - g\|_p = 0 \) if and only if \( f \equiv g \), the positive property
  2. \( \|f - g\|_p = \|g - f\|_p \), the symmetric property
  3. \( \|f - h\|_p \le \|f - g\|_p + \|g - h\|_p \), the triangle inequality

Once we have a metric, we naturally have a criterion for convergence.

Suppose that \( f_n \in L^p \) for \( n \in \N_+ \) and \( f \in L^p \). Then by definition, \( f_n \to f \) as \( n \to \infty \) in \( L^p \) if and only if \( \|f_n - f\|_p \to 0 \) as \( n \to \infty \).

Limits are unique, up to equivalence. (That is, limits are unique in \( \mathscr{L}^p \).)

Suppose again that \( f_n \in L^p \) for \( n \in \N_+ \). Recall that this sequence is said to be a Cauchy sequence if for every \( \epsilon \gt 0 \) there exists \( N \in \N_+ \) such that if \( n \gt N \) and \( m \gt N \) then \( \|f_n - f_m\|_p \lt \epsilon \). Needless to say, the Cauchy criterion is named for our ubiquitous friend Augustin Cauchy. A metric space in which every Cauchy sequence converges (to an element of the space) is said to be complete. Intuitively, one expects a Cauchy sequence to converge, so a complete space is literally one that is not missing any elements that should be there. A complete, normed vector space is called a Banach space, after the Polish mathematician Stefan Banach. Banach spaces are of fundamental importance in analysis, in large part because of the following result:

\( L^p \) is a Banach space for every \( p \in [1, \infty] \).

The Space \( L^2 \)

The norm \( \| \cdot \|_2 \) is special because it corresponds to an inner product.

For \( f, \; g \in L^2 \), define \[ \langle f, g \rangle = \int_S f g \, d\mu \]

Note that the integral is well-defined by the Cauchy-Schwarz inequality. As with all of our other definitions, this one is consistent with the equivalence relation. That is, if \( f \equiv f_1 \) and \( g \equiv g_1 \) then \( f g \equiv f_1 g_1 \) so \( \int_S f g \, d\mu = \int_S f_1 g_1 \, d\mu \) and hence \( \langle f, g \rangle = \langle f_1, g_1 \rangle \). Note also that \( \langle f, f \rangle = \|f\|_2^2 \) for \( f \in L^2 \), so this definition generates the 2-norm.

\( L^2 \) is an inner product space. That is, if \( f, \, g, \, h \in L^2 \) and \( c \in \R \) then

  1. \( \langle f, f \rangle \ge 0 \) and \( \langle f, f \rangle = 0 \) if and only if \( f \equiv 0 \), the positive property
  2. \( \langle f, g \rangle = \langle g, f \rangle \), the symmetric property
  3. \( \langle c f, g \rangle = c \langle f, g \rangle \), the scaling property
  4. \( \langle f + g, h \rangle = \langle f, g \rangle + \langle f, h \rangle \), the additive property
Proof:

Part (a) is a restatement of the positive property of the norm \( \| \cdot \|_2 \). Part (b) is obvious and parts (c) and (d) follow from the linearity of the integral.

From parts (c) and (d), the inner product is linear in the first argument, with the second argument fixed. By the symmetric property (b), it follows that the inner product is also linear in the second argument with the first argument fixed. That is, the inner product is bi-linear. A complete. inner product space is known as a Hilbert space, named for the German mathematician David Hilbert. Thus, the following result follows immediately from the previous two.

\( L^2 \) is a Hilbert space.

All inner product spaces lead naturally to the concept of orthogonality; \( L^2 \) is no exception.

Functions \( f, \; g \in L^2 \) are orthogonal if \( \langle f, g \rangle = 0 \), in which case we write \( f \perp g \). Equivalently \( f \perp g \) if \[ \int_S f g \, d\mu = 0 \]

Of course, all of the basic theorems of general inner product spaces hold in \( L^2 \). For example, the following result is the Pythagorean theorem, named of course for Phythagoras.

If \( f, \; g \in L^2 \) and \( f \perp g \) then \( \|f + g\|_2^2 = \|f\|_2^2 + \|g\|_2^2 \).

Examples and Special Cases

Countable Spaces

Suppose that \( S \) is a countable set, \( \mathscr{S} = \mathscr{P}(S) \) the power set of \( S \), and that \( \# \) is counting measure on \( (S, \mathscr{S}) \). In this case, recall that integrals are sums. The exposition will look more familiar if we use the notation of sequences rather than functions. Thus, let \( x: S \to \R \), and denote the value of \( x \) at \( i \in S \) by \( x_i \) rather than \( x(i) \). For \( p \in [1, \infty) \), the \( p \)-norm is \[ \|x\|_p = \left(\sum_{i \in S} \left|x\right|_i^p\right)^{1/p} \] On the other hand, \( \|x\|_\infty = \sup\{x_i: i \in S\} \). The only null set for \( \# \) is \( \emptyset \), so the equivalence relation \( \equiv \) is simply equality, and so the spaces \( L^p \) and \( \mathscr{L}^p \) are the same. For \( p \in [1, \infty) \), \( x \in L^p \) if and only if \[ \sum_{i \in S} \left|x\right|_i^p \lt \infty \] When \( p \in \N_+ \) (as is often the case), this condition means that \( \sum_{i \in S} x_i^p \) is absolutely convergent. On the other hand, \( x \in L^\infty \) if and only if \( x \) is bounded. When \( S = \N_+ \), the space \( L^p \) is often denoted \( l^p \). The inner produce on \( L^2 \) is \[ \langle x, y \rangle = \sum_{i \in S} x_i y_i, \quad x, \; y \in l^2 \] When \( S = \{1, 2, \ldots, n\} \), \( L^2 \) is simply the vector space \( \R^n \) with the usual addition, scalar multiplication, inner product, and norm that we study in elementary linear algebra. Orthogonal vector are perpendicular in the usual sense.

Probability Spaces

Suppose that \( S \) is the sample space of a random experiment, \( \mathscr{S} \) is the \( \sigma \)-algebra of events, and \( \P \) is a probability measure on \( (S, \mathscr{S}) \). Of course, a measurable function \( X: S \to \R \) is simply a real-valued random variable. For \( p \in [1, \infty) \), the integral \( \int_S \left|x\right|^p \, d\P \) is the expected value of \( \left|X\right|^p \), and is denoted \( \E\left(\left|X\right|^p\right) \). Thus in this case, \( L^p \) is the collection of real-valued random variables \( X \) with \( \E\left(\left|X\right|^p\right) \lt \infty \). We will study these spaces in much more detail in the chapter on expected value.