\( \renewcommand{\P}{\mathbb{P}} \) \( \newcommand{\R}{\mathbb{R}} \) \( \newcommand{\N}{\mathbb{N}} \) \( \newcommand{\bs}{\boldsymbol} \)
  1. Random
  2. 2. Distributions
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7
  10. 8
  11. 9
  12. 10
  13. 11
  14. 12
  15. 13
  16. 14

2. Continuous Distributions

Basic Theory

As usual, suppose that we have a random experiment with probability measure \(\P\) on an underlying sample space \(\Omega\). Recall that a random variable \(X\) for the experiment, with values in a set \( S \), is simply a function from \( \Omega \) to \( S \). Recall also that the probability distribution of \( X \) is the function that assigns probabilities to the subsets of \( S \), namely \( A \mapsto \P(X \in A) \) for \( A \subseteq S \). The nature of \( S \) plays a big role in how the probability distribution of \( X \) can be described. In the previous section, we studied the case where \( S \) is countable so that \( X \) has a discrete distribution. In this section, we study the other main class of distributions.

Definitions and Basic Properties

A random variable \(X\) taking values in set \(S\) is said to have a continuous distribution if \(\P(X = x) = 0\) for all \(x \in S\).

The fact that \(X\) takes any particular value with probability 0 might seem paradoxical at first, but conceptually it is the same as the fact that an interval of \(\R\) can have positive length even though it is composed of points each of which has 0 length. Similarly, a region of \(\R^2\) can have positive area even though it is composed of points (or curves) each of which has area 0.

In particular, continuous distributions are used to model variables that take values in intervals of \( \R \), variables that can, in principle, be measured with any degree of accuracy. Such variables abound in applications and include

If \(X\) has a continuous distribtion then \(\P(X \in C) = 0\) for any countable \(C \subseteq S\).

Proof:

Since \(C\) is countable, it follows from the additivity axiom of probability that

\[ \P(X \in C) = \sum_{x \in C} \P(X = x) = 0 \]

Thus, continuous distributions are in complete contrast with discrete distributions, for which all of the probability mass is concentrated on a discrete set. For a continuous distribution, the probability mass is continuously spread over \(S\). Note also that \(S\) itself cannot be countable. In the picture below, the light blue shading is intended to suggest a continuous distribution of probability.

A continuous probability distribution on \( S \)
A continuous distribution

Usually the set of values \( S \) is a subset of a Euclidean space, and a continuous distribution can usually be described by certain type of function.

Suppose that \(X\) has a continuous distribution on \(S \subseteq \R^n\). A real-valued function \(f\) defined on \(S\) is said to be a probability density function for \(X\) if \(f\) satisfies the following properties:

  1. \(f(x) \ge 0\) for all \(x \in S\)
  2. \(\int_S f(x) \, dx = 1\)
  3. \(\P(X \in A) = \int_A f(x) \, dx\) for \(A \subseteq S\)
A continuous distribution is completely determined by its probability density function
A continuous distribution

Property (c) in the definition is particularly important since it implies that the probability distribution of \(X\) is completely determined by the probability density function. Conversely, any function that satisfies properties (a) and (b) is a probability density function on \( S \), and then property (c) can be used to define a continuous probability distribution on \(S\). Note that we can always extend \(f\) to a probability density function on all of \(\R^n\) by defining \(f(x) = 0\) for \(x \notin S\). This extension sometimes simplifies notation.

If \(n \gt 1\), the integrals in properties (b) and (c) are multiple integrals over subsets of \(\R^n\) with \(\bs{x} = (x_1, x_2, \ldots, x_n)\) and \(d \bs{x} = dx_1 dx_2 \cdots dx_n\). In fact, technically, \(f\) is a probability density function relative to the standard \(n\)-dimensional measure, which we recall is given by \[\lambda_n(A) = \int_A 1 \, d\bs{x}, \quad A \subseteq \R^n\] In particular,

However, we recall that except for exposition, the low dimensional cases (\(n \in \{1, 2, 3\}\)) play no special role in probability. Interesting random experiments often involve several random variables (that is, a random vector).

More technically, \( \lambda_n \) is \( n \)-dimensional Lebesgue measure on the measurable subsets of \( \R^n \), and is named for Henri Lebesgue. If you are interested in the advanced theory of probability, read the following sections:

The points \( x \in S \) that maximize the probability density function \( f \) are important, just as in the discrete case.

An element \(x \in S\) that maximizes the probability density function \(f\) is called a mode of the distribution.

If there is only one mode, it is sometimes used as a measure of the center of the distribution.

Probability density functions of continuous distributions differ from their discrete counterparts in several important ways:

Constructing Probability Density Functions

Just as in the discrete case, a nonnegative function on \( S \) can often be scaled to produce a produce a probability density function.

Suppose that \(g\) is a nonnegative function on \(S \subseteq \R^n\). Let \[c = \int_S g(x) \, dx\] If \(0 \lt c \lt \infty\) then \(f(x) = \frac{1}{c} g(x)\) for \(x \in S\) defines a probability density function on \(S\).

Proof:

Clearly \( f(x) \ge 0 \) for \( x \in S \). Also \[ \int_S f(x) \, dx = \frac{1}{c} \int_S g(x) \, dx = \frac{c}{c} = 1 \]

Note again that \(f\) is just a scaled version of \(g\). Thus, this result can be used to construct probability density functions with desired properties (domain, shape, symmetry, and so on). The constant \(c\) is sometimes called the normalizing constant.

Conditional Densities

Suppose that \(X\) is a random variable taking values in \(S \subseteq \R^n\) with a continuous distribution that has probability density function \(f\). The probability density function of \(X\), of course, is based on the underlying probability measure \(\P\) on the sample space \(\Omega\). This measure could be a conditional probability measure, conditioned on a given event \(E \subseteq \Omega\) (with \(\P(E) \gt 0\) of course). The usual notation is \[f(x \mid E), \quad x \in S\] Note, however, that except for notation, no new concepts are involved. The function above is a probability density function for a continuous distribution. That is, it satisfies properties (a) and (b) of the definition, while property (c) becomes \[\int_A f(x \mid E) \, dx = \P(X \in A \mid E)\] All results that hold for probability density functions in general hold for conditional probability density functions. The event \( E \) could be an event described in terms of the random variable \( X \) itself:

Suppose that \( X \) has a continuous distribution with probability density function \( f \) and that \(B \subseteq S\) with \(\P(X \in B) \gt 0\). The conditional probability density function of \(X\) given \(X \in B\) is \[f(x \mid X \in B) = \frac{f(x)}{\P(X \in B)}, \quad x \in B \]

Proof:

For \(A \subseteq B\), \[ \int_A \frac{f(x)}{\P(X \in B)} \, dx = \frac{1}{\P(X \in B)} \int_A f(x) \, dx = \frac{\P(X \in A)}{\P(X \in B)} = \P(X \in A \mid X \in B) \]

Of course, \( \P(X \in B) = \int_B f(x) \, dx \) and hence is the normalizing constant for the restriction of \( f \) to \( B \).

Examples and Applications

As always, try the problems yourself before looking at the answers.

The Exponential Distribution

Let \(f(t) = r e^{-r t}\) for \(t \in [0, \infty) \), where \(r \gt 0\) is a parameter.

  1. Show that \( f \) is a probability density function.
  2. Draw a careful sketch of the graph of \( f \), and state the important qualitative features.
Proof:
  1. Note that \( f(t) \gt 0 \) for \( t \ge 0 \). Also \( \int_0^\infty e^{-r t} \, dt = \frac{1}{r} \) so \( f \) is a PDF.
  2. \( f \) is decreasing and concave upward so the mode is 0. \( f(x) \to 0 \) as \( x \to \infty \).

The distribution defined by the probability density function in the previous exercise is called the exponential distribution with rate parameter \(r\). This distribution is frequently used to model random times, under certain assumptions. Specifically, in the Poisson model of random points in time, the times between successive arrivals have independent exponential distributions, and the parameter \(r\) is the average rate of arrivals. The exponential distribution is studied in detail in the chapter on Poisson Processes.

The lifetime \(T\) of a certain device (in 1000 hour units) has the exponential distribution with parameter \(r = \frac{1}{2}\). Find

  1. \(\P(T \gt 2)\)
  2. \(\P(T \gt 3 \mid T \gt 1)\)
Answer:
  1. \(e^{-1} \approx 0.3679\)
  2. \(e^{-1} \approx 0.3679\)

In the gamma experiment, set \( n =1 \) to get the exponential distribution. Vary the rate parameter \( r \) and note the shape of the probability density function. For various values of \(r\), run the simulation 1000 times and note the agreement between the empirical density function and the probability density function.

A Random Angle

In Bertrand's problem, a certain random angle \(\Theta\) has probability density function \(f(\theta) = \sin(\theta)\) for \(\theta \in \left[0, \frac{\pi}{2}\right]\).

  1. Show that \(f\) is a probability density function.
  2. Draw a careful sketch of the graph \(f\), and state the important qualitative features.
  3. Find \(\P\left(\Theta \lt \frac{\pi}{4}\right)\).
Answer:
  1. Note that \( \sin(\theta) \ge 0 \) for \( 0 \le \theta \le \frac{\pi}{2} \). Also \( \int_0^{\pi/2} \sin(\theta) \, d\theta = 1 \) so \( f \) is a PDF.
  2. \( f \) is increasing and concave downward so the mode is \(\frac{\pi}{2}\).
  3. \(1 - \frac{1}{\sqrt{2}} \approx 0.2929\)

Bertand's problem is named for Joseph Louis Bertrand and is studied in more detail in the chapter on Geometric Models.

In Bertrand's experiment, select the model with uniform distance. Run the simulation 1000 times and compute the empirical probability of the event \(\left\{\Theta \lt \frac{\pi}{4}\right\}\). Compare with the true probability in the previous exercise.

Gamma Distributions

Let \(g_n(t) = e^{-t} \frac{t^n}{n!}\) for \(t \in [0, \infty)\) where \(n \in \N\) is a parameter.

  1. Show that \(g_n\) is a probability density function for each \(n \in \N\).
  2. Draw a careful sketch of the graph of \(g_n\), and state the important qualitative features.
Proof:
  1. Note that \( g_n(t) \ge 0 \) for \( t \ge 0 \). Also, \( g_0 \) is the probability density function of the exponential distribution with parameter 1. For \( n \in \N_+ \), integration by parts with \( u = t^n / n! \) and \( dv = e^{-t} dt \) gives \( \int_0^\infty g_n(t) \, dt = \int_0^\infty g_{n-1}(t) \, dt \). Hence it follows by induction that \( g_n \) is a PDF for each \( n \in \N_+ \).
  2. \( g_0 \) is decreasing and concave downward, with mode \( t = 0 \). For \( n \gt 0 \), \( g_n \) increases and then decreases, with mode \( t = n \). \( g_1 \) is concave downward and then upward, with inflection point at \( t = 2 \). For \( n \gt 1 \), \( g_n \) is concave upward, then downward, then upward again, with inflection points at \( n \pm \sqrt{n} \). For all \( n \in \N \), \( g_n(t) \to 0 \) as \( t \to \infty \).

Interestingly, we showed in the last section on discrete distributions, that \(f_t(n) = g_n(t)\) is a probability density function on \(\N\) for each \(t \ge 0\) (it's the Poisson distribution with parameter \(t\)). The distribution defined by the probability density function \(g_n\) belongs to the family of Erlang distributions, named for Agner Erlang; \( n + 1 \) is known as the shape parameter. The Erlang distribution is studied in more detail in the chapter on the Poisson Process. In turn the Erlang distribution belongs to the more general family of gamma distributions. The gamma distribution is studied in more detail in the chapter on Special Distributions.

In the gamma experiment, keep the default rate parameter \(r = 1\). Vary the shape parameter and note the shape and location of the probability density function. For various values of the shape parameter, run the simulation 1000 times and note the agreement between the empirical density function and the probability density function.

Suppose that the lifetime of a device \(T\) (in 1000 hour units) has the gamma distribution above with \(n = 2\). Find each of the following:

  1. \(\P(T \gt 3)\).
  2. \( \P(T \le 2) \)
  3. \( \P(1 \le T \le 4) \)
Answer:
  1. \(\frac{17}{2} e^{-3} \approx 0.4232\)
  2. \( 1 - 5 e^{-2} \approx 0.3233 \)
  3. \( \frac{5}{2} e^{-1} - 13 e^{-4} \approx 0.6816 \)

Beta Distributions

Let \(f(x) = 6 x (1 - x)\) for \(x \in [0, 1]\).

  1. Show that \( f \) is a probability density function.
  2. Draw a careful sketch of the graph of \(f\), and state the important qualitative features.
Answer:
  1. Note that \( f(x) \ge 0 \) for \( x \in [0, 1] \). Also, \( \int_0^1 x (1 - x) \, dx = \frac{1}{6} \), so \( f \) is a PDF
  2. \( f \) increases and then decreases, with mode at \(x = \frac{1}{2} \). \( f \) is concave downward. \( f \) is symmetric about \( x = \frac{1}{2} \) (in fact, the graph is a parabola).

Let \(f(x) = 12 x^2 (1 - x)\) for \(x \in [0, 1]\).

  1. Show that \( f \) is a probability density function.
  2. Draw a careful sketch the graph of \(f\), and state the important qualitative features.
Answer:
  1. Note that \( f(x) \ge 0 \) for \( 0 \le x \le 1 \). Also \( \int_0^1 x^2 (1 - x) \, dx = \frac{1}{12} \), so \( f \) is a PDF.
  2. \( f \) increases and then decreases, with mode at \(x = \frac{2}{3}\). \( f \) is concave upward and then downward, with inflection point at \(x = \frac{1}{3}\).

The distributions defined in the last two exercises are examples of beta distributions. These distributions are widely used to model random proportions and probabilities, and physical quantities that take values in bounded intervals (which, after a change of units, can be taken to be \( [0, 1] \)). Beta distributions are studied in detail in the chapter on Special Distributions.

In the special distribution simulator, select the beta distribution. For the following parameter values, note the shape of the probability density function. Run the simulation 1000 times and note the agreement between the empirical density function and the probability density function.

  1. \(a = 2\), \(b = 2\). This gives the first beta distribution above
  2. \(a = 3\), \(b = 2\). This gives the second beta distribution above.

Suppose that \( P \) is a random proportion. Find \( \P\left(\frac{1}{4} \le P \le \frac{3}{4}\right) \) in each of the following cases:

  1. \( P \) has the first beta distribution above.
  2. \( P \) has the second beta distribution above.
Answer:
  1. \(\frac{11}{16}\)
  2. \(\frac{11}{16}\)

Let \( f \) be the function defined by \[f(x) = \frac{1}{\pi \sqrt{x (1 - x)}}, \quad x \in (0, 1)\]

  1. Show that \( f \) is a probability density function.
  2. Draw a careful sketch of the graph of \(f\), and state the important qualitative features.
Answer:
  1. Note that \( f(x) \gt 0 \) for \( 0 \lt x \lt 1 \). Using the substitution \( u = \sqrt{x} \) givens \[ \int_0^1 \frac{1}{\sqrt{x (1 - x)}} \, dx = \int_0^1 \frac{2}{\sqrt{1 - u^2}} \, du = 2 \arcsin(u) \biggm|_0^1 = \pi \] Thus \( f \) is a PDF.
  2. \( f \) is symmetric about \( x = \frac{1}{2} \). \( f \) decreases and then increases, with minimum at \( x = \frac{1}{2} \). \( f(x) \to \infty \) as \( x \downarrow 0 \) and as \( x \uparrow 1 \) so the distribution has no mode. \( f \) is concave upward.

The distribution defined in the last exercise is also a member of the beta family of distributions. But it is also known as the (standard) arcsine distribution, because of the arcsine function that arises in the proof that \( f \) is a probability density function. The arcsine distribution has applications to a very important random process known as Brownian motion, named for the Scottish botanist Robert Brown. Arcsine distributions are studied in more generality in the chapter on Special Distributions.

In the special distribution simulator, select the (continuous) arcsine distribution and keep the default parameter values. Run the simulation 1000 times and note the agreement between the empirical density function and the probability density function.

Suppose that \( X_t \) represents the change in the price of a stock at time \( t \), relative to the value at an initial reference time 0. We treat \( t \) as a continuous variable measured in weeks. Let \( T = \max\left\{t \in [0, 1]: X_t = 0\right\} \), the last time during the first week that the stock price was unchanged over its initial value. Under certain ideal conditions, \( T \) will have the arcsine distribution. Find each of the following:

  1. \( \P\left(T \lt \frac{1}{4}\right)\)
  2. \( \P\left(T \ge \frac{1}{2}\right) \)
  3. \( \P\left(T \le \frac{3}{4}\right) \)
Answer:
  1. \( \frac{1}{3} \)
  2. \( \frac{1}{2} \)
  3. \( \frac{2}{3} \)

Open the Brownian motion experiment and select the last zero variable. Run the experiment in single step mode a few times. The random process that you observe models the price of the stock in the previous exercise. Now run the experiment 1000 times and compute the empirical probability of each event in the previous exercise.

The Pareto Distribution

Let \(g(x) = \frac{1}{x^b}\) for \(x \in [1, \infty)\), where \(b \gt 0\) is a parameter.

  1. Draw a careful sketch the graph of \(g\), and state the important qualitative features.
  2. Find the values of \( b \) for which there exists a probability density function \( f \) proportional to \(g\). Identify the mode.
Answer:
  1. \( g \) is decreasing and concave upward, with \( g(x) \to 0 \) as \( x \to \infty \).
  2. Note that if \( b \ne 1 \) \[\int_1^\infty x^{-b} \, dx = \frac{x^{1 - b}}{1 - b} \biggm|_1^\infty = \begin{cases} \infty, & 0 \lt b \lt 1 \\ \frac{1}{b - 1}, & 1 \lt b \lt \infty \end{cases} \] When \( b = 1 \) we have \( \int_1^\infty x^{-1} \, dx = \ln(x) \biggm|_1^\infty = \infty \). Thus, when \( 0 \lt b \le 1 \), there is no PDF proportional to \( g \). When \( b \gt 1 \), the PDF proportional to \( g \) is \( f(x) = \frac{b - 1}{x^b} \) for \( x \in [1, \infty) \). The mode is 1.

Note that the qualitative features of \( g \) are the same, regardless of the value of the parameter \( b \gt 0 \), but only when \( b \gt 1 \) can \( g \) be normalized into a probability density function. In this case, the distribution is known as the Pareto distribution, named for Vilfredo Pareto. The parameter \(a = b - 1\), so that \(a \gt 0\), is known as the shape parameter. Thus, the Pareto distribution with shape parameter \(a\) has probability density function \[f(x) = \frac{a}{x^{a+1}}, \quad x \in [1, \infty)\] The Pareto distribution is widely used to model certain economic variables and is studied in detail in the chapter on Special Distributions.

In the special distribution simulator, select the Pareto distribution. Leave the scale parameter fixed, but vary the shape parameter, and note the shape of the probability density function. For various values of the shape parameter, run the simulation 1000 times and note the agreement between the empirical density function and the probability density function.

Suppose that the income \(X\) (in appropriate units) of a person randomly selected from a population has the Pareto distribution with shape parameter \(a = 2\). Find each of the following:

  1. \(\P(X \gt 2)\)
  2. \( \P(X \le 4) \)
  3. \( \P(3 \le X \le 5) \)
Answer:
  1. \(\frac{1}{4}\)
  2. \( \frac{15}{16} \)
  3. \( \frac{16}{225} \)

The Cauchy Distribution

Let \( f \) be the function defined by \[f(x) = \frac{1}{\pi (x^2 + 1)}, \quad x \in \R\]

  1. Show that \( f \) is a probability density function.
  2. Draw a careful sketch the graph of \(f\), and state the important qualitative features.
Answer:
  1. Note that \( f(x) \gt 0 \) for \( x \in \R \). Also \[ \int_{-\infty}^\infty \frac{1}{1 + x^2} \, dx = \arctan(x) \biggm|_{-\infty}^\infty = \pi \] and hence \( f \) is a PDF.
  2. \( f \) increases and then decreases, with mode \(x = 0\). \( f \) is concave upward, then downward, then upward again, with inflection points at \(x = \pm \frac{1}{\sqrt{3}}\). \( f \) is symmetric about \( x = 0 \).

The distribution constructed in the previous exercise is known as the (standard) Cauchy distribution, named after Augustin Cauchy It might also be called the arctangent distribution, because of the appearance of the arctangent function in the proof that \( f \) is a probability density function. In this regard, note the similarity to the arcsine distribution above. The Cauchy distribution is studied in more generality in the chapter on Special Distributions. Note also that the Cauchy distribution is obtained by normalizing the function \(x \mapsto \frac{1}{1 + x^2}\); the graph of this function is known as the witch of Agnesi, in honor of Maria Agnesi.

In the special distribution simulator, select the Cauchy distribution with the default parameter values. Run the simulation 1000 times and note the agreement between the empirical density function and the probability density function.

A light source is 1 meter away from position 0 on an infinite, straight wall. The angle \( \Theta \) that the light beam makes with the perpendicular to the wall is randomly chosen from the interval \( \left(-\frac{\pi}{2}, \frac{\pi}{2}\right) \). The position \( X = \tan(\Theta) \) of the light beam on the wall has the standard Cauchy distribution. Find each of the following:

  1. \( \P(-1 \lt X \lt 1) \).
  2. \( \P\left(X \ge \frac{1}{\sqrt{3}}\right)\)
  3. \( \P(X \le \sqrt{3}) \)
Answer:
  1. \( \frac{1}{2} \)
  2. \( \frac{1}{3} \)
  3. \(\frac{2}{3}\)

The Cauchy experiment (with the default parameter values) is a simulation of the experiment in the last exercise.

  1. Run the experiment a few times in single step mode.
  2. Run the experiment 1000 times and note the agreement between the empirical density function and the probability density function.
  3. Using the data from (b), compute the relative frequency of each event in the previous exercise, and compare with the true probability.

The Standard Normal Distribution

Let \(\phi(z) = \frac{1}{\sqrt{2 \pi}} e^{-z^2/2}\) for \(z \in \R\).

  1. Show that \( \phi \) is a probability density function.
  2. Draw a careful sketch the graph of \(\phi\), and state the important qualitative features.
Proof:
  1. Note that \( \phi(z) \gt 0 \) for \( z \in \R \). Let \(c = \int_{-\infty}^\infty e^{-z^2 / 2} \, dz\). Then \[ c^2 = \int_{-\infty}^\infty e^{-x^2/2} \, dx \int_{-\infty}^\infty e^{-y^2/2} \, dy = \int_{-\infty}^\infty \int_{-\infty}^\infty e^{-(x^2 + y^2) / 2} \, dx \, dy \] Change to polar coordinates: \(x = r \cos(\theta)\), \(y = r \sin(\theta)\) where \(r \in [0, \infty)\) and \(\theta \in [0, 2 \pi)\). Then \(x^2 + y^2 = r^2\) and \(dx \, dy = r \, dr \, d\theta\). Hence \[ c^2 = \int_0^{2 \pi} \int_0^\infty r e^{-r^2 / 2} \, dr \, d\theta \] Using the simple substitution \(u = r^2\), the inner integral is \(\int_0^\infty e^{-u} du = 1\). Then the outer integral is \(\int_0^{2\pi} 1 \, d\theta = 2 \pi\). Hence \( c = \sqrt{2 \pi} \) and so \( f \) is a PDF.
  2. Note that \( \phi \) is symmetric about 0. \( \phi \) increases and then decreases, with mode \( z = 0 \). \( \phi \) is concave upward, then downward, then upward again, with inflection points at \(z = \pm 1 \). \( \phi(z) \to 0 \) as \( z \to \infty \) and as \( z \to -\infty \).

The distribution defined in the last exercise is the standard normal distribution, perhaps the most important distribution in probability. It's importance stems largely from the central limit theorem, one of the fundamental theorems in probability. In particular, normal distributions are widely used to model physical measurements that are subject to small, random errors. The family of normal distributions is studied in more generality in the chapter on Special Distributions.

In the special distribution simulator, select the normal distribution and keep the default parameter values. Run the simulation 1000 times and note the agreement between the empirical density function and the probability density function.

Suppose that the error \( Z \) in the length of a certain machined part (in millimeters) has the standard normal distribution. Use mathematical software to approximate each of the following:

  1. \( \P(-1 \le Z \le 1) \)
  2. \( \P(Z \gt 2) \)
  3. \( \P(Z \lt -3) \)
Answer:
  1. 0.6827
  2. 0.0228
  3. 0.0013

The Extreme Value Distribution

Let \(f(x) = e^{-x} e^{-e^{-x}}\) for \(x \in \R\).

  1. Show that \(f\) is a probability density function.
  2. Draw a careful sketch of the graph of \(f\), and state the important qualitative features.
  3. Find \(\P(X \gt 0)\), where \(X\) has probability density function \(f\).
Answer:
  1. Note that \( f(x) \gt 0 \) for \( x \in \R \). Using the substitution \( u = e^{-x} \), \[ \int_{-\infty}^\infty e^{-x} e^{-e^{-x}} \, dx = \int_0^\infty e^{-u} \, du = 1 \] (note that the integrand in the last integral is the exponential PDF with parameter 1.
  2. \( f \) increases and then decreases, with mode \(x = 0\). \( f \) is concave upward, then downward, then upward again, with inflection points at \(x = \pm \ln\left[\left(3 + \sqrt{5}\right)\middle/2\right] \). Note however that \( f \) is not symmetric about 0. \( f(x) \to 0 \) as \( x \to \infty \) and as \( x \to -\infty \).
  3. \(1 - e^{-1} \approx 0.6321\)

The distribution in the last exercise is the (standard) type 1 extreme value distribution, also known as the Gumbel distribution in honor of Emil Gumbel. Extreme value distributions are studied in more generality in the chapter on Special Distributions.

In the special distribution simulator, select the extreme value distribution. Keep the default parameter values and note the shape and location of the probability density function. Run the simulation 1000 times and note the agreement between the empirical density function and the probability density function.

The Logistic Distribution

Let \( f \) be the function defined by \[f(x) = \frac{e^x}{(1 + e^x)^2}, \quad x \in \R\]

  1. Show that \(f\) is a probability density function.
  2. Draw a careful sketch the graph of \(f\), and state the important qualitative features.
  3. Find \(\P(X \gt 1)\), where \(X\) has probability density function \(f\).
Answer:
  1. Note that \( f(x) \gt 0 \) for \( x \in \R \). The substitution \( u = e^x \) gives \[ \int_{-\infty}^\infty f(x) \, dx = \int_0^\infty \frac{1}{(1 + u)^2} \, du = 1 \]
  2. \( f \) is symmetric about 0. \( f \) increases and then decreases with mode \(x = 0\). \( f \) is concave upward, then downward, then upward again, with inflection points at \(x = \pm \ln\left(2 + \sqrt{3}\right)\). \( f(x) \to 0 \) as \( x \to \infty \) and as \( x \to -\infty \).
  3. \(\frac{1}{1 + e} \approx 0.2689\)

The distribution in the last exercise is the (standard) logistic distribution. Logistic distributions are studied in more generality in the chapter on Special Distributions.

In the special distribution simulator, select the logistic distribution. Keep the default parameter values and note the shape and location of the probability density function. Run the simulation 1000 times and note the agreement between the empirical density function and the probability density function.

Weibull Distributions

Let \(f(t) = 2 t e^{-t^2}\) for \( t \in [0, \infty) \).

  1. Show that \(f\) is a probability density function.
  2. Draw a careful sketch the graph of \(f\), and state the important qualitative features.
Answer:
  1. Note that \( f(t) \ge 0 \) for \( t \ge 0 \). The substitution \( u = t^2 \) gives \( \int_0^\infty f(t) \, dt = \int_0^\infty e^{-u} \, du = 1 \).
  2. \( f \) increases and then decreases, with mode \(t = 1/\sqrt{2} \). \( f \) is concave downward and then upward, with inflection point at \(t = \sqrt{3/2}\). \( f(t) \to 0 \) as \( t \to \infty \).

Let \(f(t) = 3 t^2 e^{-t^3}\) for \(t \ge 0\).

  1. Show that \(f\) is a probability density function.
  2. Draw a careful sketch the graph of \(f\), and state the important qualitative features.
Answer:
  1. Note that \( f(t) \ge 0 \) for \( t \ge 0 \). The substitution \( u = t^3 \) gives \[ \int_0^\infty f(t) \, dt = \int_0^\infty e^{-u} \, du = 1 \]
  2. \( f \) increases and then decreases, with mode \(t = \left(\frac{2}{3}\right)^{1/3}\). \( f \) is concave upward, then downward, then upward again, with inflection points at \( t = \left(1 \pm \frac{1}{3}\sqrt{7}\right)^{1/3} \). \( f(t) \to 0 \) as \( t \to \infty \).

The distributions in the last two exercises are examples of Weibull distributions, name for Waloddi Weibull. Weibull distributions are studied in more generality in the chapter on Special Distributions. They are often used to model random failure times of devices (in appropriately scaled units).

In the special distribution simulator, select the Weibull distribution. For each of the following values of the shape parameter \(k\), note the shape and location of the probability density function. Run the simulation 1000 times and note the agreement between the empirical density function and the probability density function.

  1. \(k = 2\). This gives the first Weibull distribution above.
  2. \(k = 3\). This gives the second Weibull distribution) above.

Suppose that \( T \) is the failure time of a device (in 1000 hour units). Find \( \P\left(T \gt \frac{1}{2}\right) \) in each of the following cases:

  1. \( T \) has the first Weibull distribution above.
  2. \( T \) has the second Weibull distribution above.
Answer:
  1. \(e^{-1/4} \approx 0.7788\)
  2. \(e^{-1/8} \approx 0.8825\)

Additional Examples

Let \(f(x) = -\ln(x)\) for \(x \in (0, 1]\).

  1. Show that \(f\) is a probability density function.
  2. Draw a careful sketch of the graph of \(f\), and state the important qualitative features.
  3. Find \(\P\left(\frac{1}{3} \le X \le \frac{1}{2}\right)\) where \(X\) has the probability density function in (a).
Answer:
  1. Note that \( -\ln(x) \ge 0 \) for \(0 \lt x \le 1\). Integration by parts with \( u = -\ln(x) \) and \( dv = dx \) gives \[ \int_0^1 -\ln(x) \, dx = -x \ln(x) \biggm|_0^1 + \int_0^1 1 \, dx = 1 \]
  2. \( f \) is decreasing and concave upward, with \( f(x) \to \infty \) as \( x \downarrow 0 \), so there is no mode.
  3. \(\frac{1}{2} \ln(2) - \frac{1}{3} \ln(3) + \frac{1}{6} \approx 0.147\)

Let \(f(x) = 2 e^{-x} (1 - e^{-x})\) for \(x \in [0, \infty)\).

  1. Show that \( f \) is a probability density function.
  2. Draw a careful sketch of the graph of \(f\), and give the important qualitative features.
  3. Find \(\P(X \ge 1)\) where \(X\) has the probability density function in (a).
Answer:
  1. Note that \( f(x) \gt 0 \) for \( 0 \lt x \lt \infty. \). Also, \( \int_0^\infty \left(e^{-x} - e^{-2 x}\right) \, dx = \frac{1}{2} \), so \( f \) is a PDF.
  2. \( f \) increases and then decreases, with mode \( x = \ln(2) \). \( f \) is concave downward and then upward, with an inflection point at \( x = \ln(4) \). \( f(x) \to 0 \) as \( x \to \infty \).
  3. \(2 e^{-1} - e^{-2} \approx 0.6004 \)

The following problems deal with two and three dimensional random vectors having continuous distributions. The relationship between the distribution of a vector and the distribution of its components will be discussed later, in the section on joint distributions.

Let \(f(x, y) = x + y\) for \(0 \le x \le 1\), \(0 \le y \le 1\).

  1. Show that \(f\) is a probability density function, and identify the mode.
  2. Find \(\P(Y \ge X)\) where \((X, Y)\) has the probability density function in (a).
  3. Find the conditional density of \((X, Y)\) given \(\left\{X \lt \frac{1}{2}, Y \lt \frac{1}{2}\right\}\).
Answer:
  1. mode \( (1, 1) \)
  2. \(\frac{1}{2}\)
  3. \(f\left(x, y \bigm| X \lt \frac{1}{2}, Y \lt \frac{1}{2}\right) = 8 (x + y)\) for \(0 \lt x \lt \frac{1}{2}\), \(0 \lt y \lt \frac{1}{2}\)

Let \(g(x, y) = x + y\) for \(0 \le x \le y \le 1\).

  1. Find the probability density function \(f\) that is proportional to \(g\).
  2. Find \(\P(Y \ge 2 X)\) where \((X, Y)\) has the probability density function in (a).
Answer:
  1. \(f(x,y) = 2(x + y)\), \(0 \le x \le y \le 1\)
  2. \(\frac{5}{12}\)

Let \(g(x, y) = x^2 y\) for \(0 \le x \le 1\), \(0 \le y \le 1\).

  1. Find the probability density function \(f\) that is proportional to \(g\).
  2. Find \(\P(Y \ge X)\) where \((X, Y)\) has the probability density function in (a).
Answer:
  1. \(f(x,y) = 6 x^2 y\) for \(0 \le x \le 1\), \(0 \le y \le 1\)
  2. \(\frac{2}{5}\)

Let \(g(x, y) = x^2 y\) for \(0 \le x \le y \le 1\).

  1. Find the probability density function \(f\) that is proportional to \(g\).
  2. Find \(P(Y \ge 2 X)\) where \((X, Y)\) has the probability density function in (a).
Answer:
  1. \(f(x,y) = 15 x^2 y\) for \(0 \le x \le y \le 1\)
  2. \(\frac{1}{8}\)

Let \(g(x, y, z) = x + 2 y + 3 z\) for \(0 \le x \le 1\), \(0 \le y \le 1\), \(0 \le z \le 1\).

  1. Find the probability density function \(f\) that is proportional to \(g\).
  2. Find \(\P(X \le Y \le Z)\) where \((X, Y, Z)\) has the probability density function in (a).
Answer:
  1. \(f(x, y, z) = \frac{1}{3}(x + 2 y + 3 z)\) for \(0 \le x \le 1\), \(0 \le y \le 1\), \(0 \le z \le 1\)
  2. \(\frac{7}{36}\)

Let \(g(x, y) = e^{-x} e^{-y}\) for \(0 \le x \le y \lt \infty\).

  1. Find the probability density function \(f\) that is proportional to \(g\).
  2. Find \(\P(X + Y \lt 1)\) where \((X, Y)\) has the probability density function in (a).
Answer:
  1. \(f(x,y) = 2 e^{-x} e^{-y}\), \(0 \lt x \lt y \lt \infty\)
  2. \(1 - 2 e^{-1} \approx 0.2642\)

Continuous Uniform Distributions

In this subsection, we will study an important class of continuous distributions. First, recall again that the standard measure of size on \(\R^n\) is \[\lambda_n(A) = \int_A 1 \, d\bs{x}, \quad A \subseteq \R^n\] In particular, \(\lambda_1(A)\) is the length of \(A\) for \(A \subseteq \R\); \(\lambda_2(A)\) is the area of \(A\) for \(A \subset \R^2\); \(\lambda_3(A)\) is the volume of \(A\) for \(A \subseteq \R^3\).

Suppose that \(S \subseteq \R^n\) with \(0 \lt \lambda_n(S) \lt \infty\). Then

  1. \(f(x) = 1 \big/ \lambda_n(S)\) for \(x \in S\) defines a probability density function on \(S\).
  2. If \(X\) has the probability density function in (a) then \(\P(X \in A) = \lambda_n(A) \big/ \lambda_n(S) \) for \(A \subseteq S\).
Proof:

Clearly \( f(x) \gt 0 \) for \( x \in S \). Also, \[ \int_A f(x) \, dx = \frac{1}{\lambda_n(S)} \int_A 1 \, dx = \frac{\lambda_n(A)}{\lambda_n(S)} \] In particular, when \( A = S \) we have \( \int_S f(x) \, dx = 1 \).

A random variable \(X\) with this probability density function is said to have the continuous uniform distribution on \(S\). From part (b), note that the probability assigned to a subset \(A\) of \(S\) is proportional to the standard measure of \(A\). Note also that in both the discrete and continuous cases, a random variable \(X\) is uniformly distributed on a set \(S\) if and only if the probability density function is constant on \(S\). The uniform distribution on a set \( S \) governs a point chosen at random from \( S \), and in particular, such distributions play a fundamental role in various Geometric Models. Uniform distributions are studied in more generality in the chapter on Special Distributions.

In the simple probability experiment, random points are uniformly distributed on the rectangular region \( S \). Move and resize the events \( A \) and \( B \) and note how the probabilities of the 16 events that can be constructed from \( A \) and \( B \) change. Run the experiment 1000 times and note the agreement between the relative frequencies of the events and the probabilities of the events.

Suppose that \( (X, Y) \) is uniformly distributed on the circular region of radius 5, centered at the origin. We can think of \( (X, Y) \) as the position of a dart thrown randomly at a target. Let \( R = \sqrt{X^2 + Y^2} \), the distance from the center to \( (X, Y) \).

  1. Give the probability density function of \( (X, Y) \).
  2. Find \( \P(n \le R \le n + 1 \) for \( n \in \{0, 1, 2, 3, 4\} \).
Answer:
  1. \( f(x, y) = \frac{1}{25 \pi} \) for \( \left\{(x, y) \in \R^2: x^2 + y^2 \le 25\right\} \)
  2. \( \P(n \le R \le n + 1) = \frac{2 n + 1}{25} \) for \( n \in \{0, 1, 2, 3, 4\} \)

Suppose that \((X, Y, Z)\) is uniformly distributed on the cube \(S = [0, 1]^3\). Find \(\P(X \lt Y \lt Z)\).

  1. Compute the probability using the probability density function.
  2. Compute the probability using a combinatorial argument.
Answer:
  1. \( \P(X \lt Y \lt Z) = \int_0^1 \int_0^z \int_0^y 1 \, dx \, dy \, dz = \frac{1}{6} \)
  2. Each of the 6 strict orderings of \( (X, Y, Z) \) are equally likely, so \( \P(X \lt Y \lt Z) = \frac{1}{6} \)

The most important special case is the uniform distribution on an interval \([a, b]\) where \(a, b \in \R\), and \(a \lt b\). In this case, the probability density function is \[f(x) = \frac{1}{b - a}, \quad a \le x \le b\] This distribution models a point chosen at random from the interval. In particular, the uniform distribution on \([0, 1]\) is known as the standard uniform distribution, and is very important because of its simplicity and the fact that it can be transformed into a variety of other probability distributions on \(\R\). Almost all computer languages have procedures for simulating independent, standard uniform variables, which are called random numbers in this context.

The time \(T\) (in minutes) required to perform a certain job is uniformly distributed over the interval \([15, 60]\).

  1. Find the probability that the job requires more than 30 minutes
  2. Given that the job is not finished after 30 minutes, find the probability that the job will require more than 15 additional minutes.
Answer:
  1. \(\frac{2}{3}\)
  2. \(\frac{1}{6}\)

Simulation

Suppose that \(S \subseteq \R^n\) and that \(0 \lt \lambda_n(S) \lt \infty\) and that \(R \subseteq S\) with \(\lambda_n(R) \gt 0\). If \(X\) is uniformly distributed on \(S\), then the conditional distribution of \(X\) given \(X \in R\) is uniformly distributed on \(R\).

Proof:

For \( A \subseteq R \),

\[ \P(X \in A \mid X \in R) = \frac{\P(X \in A, X \in R)}{\P(X \in R)} = \frac{\P(X \in A)}{\P(X \in R)} = \frac{\lambda_n(A) \big/ \lambda_n(S)}{\lambda_n(R) \big/ \lambda_n(S)} = \frac{\lambda_n(A)}{\lambda_n(R)} \]

The last theorem has important implications for simulations. Suppose that \(R \subseteq \R^n\) satisfies \(\lambda_n(R) \gt 0\) as before, but suppose also that \(R\) is bounded, so that \(R \subseteq S\) where \(S \subseteq \R^n\) is a Cartesian product of \(n\) bounded intervals. It turns out to be quite easy to simulate a sequence of independent random variables \((X_1, X_2, \ldots)\) each of which is uniformly distributed on \(S\). Now let \[N = \min\{k \in \N_+: X_k \in R\}\] the first time that one of the random variables lands in \(R\). Note that \(N\) has the geometric distribution on \(\N_+\) with success parameter \(p = \lambda_n(R) \big/ \lambda_n(S)\), so \( \P(N = n) = (1 - p)^{n-1} p \) for \( n \in \N_+ \). Now let \(Y = X_N\) so that \(Y\) is the first term of the sequence that falls in \(R\). We know from our work on independence and conditional probability that the distribution of \(Y\) is the same as the conditional distribution of \(X\) given \(X \in R\), which by the previous theorem, is uniformly distributed on \(R\). Thus, we have derived an algorithm for simulating a random variable that is uniformly distributed on an irregularly shaped region \(R\) (assuming that we have an algorithm for recognizing when a point \(x \in \R^n\) falls in \(R\)). This method of simulation is known as the rejection method, and as we will see in subsequent sections, is more important that might first appear.

With a sequence of independent, uniformly distributed points in \( S \), the first one to fall in \( R \) is uniformly distributed on \( R \).
Rejection.png

Data Analysis Exercises

If \(D\) is a data set from a variable \(X\) with a continuous distribution, then an empirical density function can be computed by partitioning the data range into subsets of small size, and then computing the probability density of points in each subset. Empirical probability density functions are studied in more detail in the chapter on Random Samples.

For the cicada data, \(BW\) denotes body weight (in grams), \(BL\) body length (in millimeters), and \(G\) gender (0 for female and 1 for male). Construct an empirical density function for each of the following and display each as a bar graph:

  1. \(BW\)
  2. \(BL\)
  3. \(BW\) given \(G = 0\)
Answer:
  1. BW \((0, 0.1]\) \((0.1, 0.2]\) \((0.2, 0.3]\) \((0.3, 0.4]\)
    Density 0.8654 5.8654 3.0769 0.1923
  2. BL \((15, 29]\) \((20, 25]\) \((25, 30]\) \((30, 35]\)
    Density 0.0058 0.1577 0.0346 0.0019
  3. BW \((0, 0.1]\) \((0.1, 0.2]\) \((0.2, 0.3]\) \((0.3, 0.4]\)
    Density given \(G = 0\) 0.3390 4.4068 5.0847 0.1695