\(\renewcommand{\P}{\mathbb{P}}\) \(\newcommand{\R}{\mathbb{R}}\)
  1. Random
  2. 2. Distributions
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7
  10. 8
  11. 9
  12. 10
  13. 11
  14. 12
  15. 13
  16. 14

3. Mixed Distributions

Basic Theory

As usual, we start with a random experiment with probability measure \(\P\) on an underlying sample space. In this section, we will discuss two mixed cases for the distribution of a random variable: the case where the distribution is partly discrete and partly continuous, and the case where the variable has both discrete coordinates and continuous coordinates.

Distributions of Mixed Type

Suppose that \(X\) is a random variable for the experiment, taking values in \(S \subseteq \R^n\). Then \(X\) has a distribution of mixed type if \(S\) can be partitioned into subsets \(D\) and \(C\) with the following properties:

  1. \(D\) is countable and \(0 \lt \P(X \in D) \lt 1\).
  2. \(\P(X = x) = 0\) for all \(x \in C\).

Thus, part of the distribution of \(X\) is concentrated at points in a discrete set \(D\); the rest of the distribution is continuously spread over \(C\). In the picture below, the light blue shading is intended to represent a continuous distribution of probability while the darker blue dots are intended to represents points of positive probability.

A mixed distribution on \( S \)
A mixed distribution

Let \(p = \P(X \in D)\), so that \(0 \lt p \lt 1\). We can define a function on \(D\) that is a partial probability density function for the discrete part of the distribution.

Let \(g(x) = \P(X = x)\) for \(x \in D\). Then

  1. \(g(x) \ge 0\) for \(x \in D\)
  2. \(\sum_{x \in D} g(x) = p\)
  3. \(\P(X \in A) = \sum_{x \in A} g(x)\) for \(A \subseteq D\)

Part (a) is clear. Part (c) follow from the countable additivity of probability: \[ \P(X \in A) = \sum_{x \in A} \P(X = x) = \sum_{x \in A} g(x), \quad A \subseteq D \] Then taking \( A = D \) and using the definition of \( p \) gives (b).

Usually, the continuous part of the distribution is also described by a partial probability density function. Thus, suppose there is a nonnegative function \(h\) on \(C\) such that \[ \P(X \in A) = \int_A h(x) \, dx, \quad A \subseteq C \]

\(\int_C h(x) \, dx = 1 - p\).


This follows by taking \( A = C \) in the displayed equation above, and the definition of \( p \).

As with purely continuous distributions, the existence of a probability density function for the continuous part of a mixed distribution is not guaranteed. And when it does exist, a density function for the continuous part is not unique. Note that the values of \( h \) could be changed to other nonnegative values on a countable set, and the displayed equation above would still hold, because only integrals of \( h \) are important. Technically, \( g \) is a partial density with respect to counting measure \( \# \) on \( D \), while \( h \) is a partial density with respect to Lebesgue measure \( \lambda_n \) on \( C \), named for the French mathematician Henri Lebesgue. If you are interested in the advanced theory of probability, read the following sections:

Returning to our main discussion, the distribution of \(X\) is completely determined by the partial probability density functions \(g\) and \(h\). First, we extend the functions \(g\) and \(h\) to in the usual way: \(g(x) = 0\) for \(x \in C\), and \(h(x) = 0\) for \(x \in D\).

For \(A \subseteq S\), \[ \P(X \in A) = \sum_{x \in A \cap D} g(x) + \int_{A \cap C} h(x) \, dx \]


This follows from our previous results, since \( C \) and \( D \) partition \( S \): \[ \P(X \in A) = \P(X \in A \cap D) + \P(X \in A \cap C) = \sum_{x \in A \cap D} g(x) + \int_{A \cap C} h(x) dx, \quad A \subseteq S \]

A mixed distribution is completely determined by its partial density functions.
A mixed distribution

The conditional distributions on \(D\) and on \(C\) are purely discrete and continuous, respectively.

The conditional distribution of \(X\) given \(X \in D\) is discrete, with probability density function \[ f(x \mid X \in D) = \frac{g(x)}{p}, \quad x \in D \]


For \( A \subseteq D \), \[ \P(X \in A \mid X \in D) = \frac{\P(X \in A)}{\P(X \in D)} = \frac{1}{p} \sum_{x \in A} g(x) \]

The conditional distribution of \(X\) given \(X \in C\) is continuous, with probability density function \[ f(x \mid X \in C) = \frac{h(x)}{1 - p}, \quad x \in C \]


For \( A \subseteq C \) \[ \P(X \in A \mid X \in C) = \frac{\P(X \in A)}{\P(X \in C)} = \frac{1}{1 - p} \int_A h(x) \, dx \]

Thus, the distribution of \(X\) is a mixture of a discrete distribution and a continuous distribution. Mixtures are studied in more generality in the section on conditional distributions.

Truncated Variables

Distributions of mixed type occur naturally when a random variable with a continuous distribution is truncated in a certain way. For example, suppose that \(T\) taking values in \([0, \infty)\) is the random lifetime of a device, and has a continuous distribution with probability density function \(f\). In a test of the device, we can't wait forever, so we might select a positive constant \(a\) and record the random variable \(U\), defined by truncating \(T\) at \(a\), as follows: \[ U = \begin{cases} T, & T \lt a \\ a, & T \ge a \end{cases}\]

\(U\) has a mixed distribution. In particular,

  1. \(D = \{a\}\) and \(g(a) = \int_a^\infty f(t) \, dt\)
  2. \(C = [0, a)\) and \(h(t)= f(x)\) for \(x \in [0, a)\)

Suppose that random variable \(X\) has a continuous distribution on \(\R\), with probability density function \(f\). The variable is truncated at \(a\) and \(b\) (\(a \lt b\)) to create a new random variable \(Y\) as follows: \[ Y = \begin{cases} a, & X \le a \\ X, & a \lt X \lt b \\ b, & X \ge b \end{cases} \]

\(Y\) has a mixed distribution. In particular

  1. \(D = \{a, b\}\), \(g(a) = \int_{-\infty}^a f(x) \, dx\), \(g(b) = \int_b^\infty f(x) \, dx\)
  2. \(C = (a, b)\) and \(h(x) = f(x)\) for \(x \in (a, b)\)

Random Variable with Mixed Coordinates

Suppose \(X\) and \(Y\) are random variables for our experiment, and that \(X\) has a discrete distribution, taking values in a countable set \(S\) while \(Y\) has a continuous distribution on \(T \subseteq \R^n\).

\(\P\left[(X, Y) = (x, y)\right] = 0\) for \((x, y) \in S \times T\).


For \( x \in S \) and \( y \in T \), note that \(\{(X, Y) = (x, y)\} = \{X = x, Y = y\} \subseteq \{X = x\} \), and by assumption, \( \P(X = x) = 0 \).

Thus, \( (X, Y) \) has a continuous distribution, not a mixed in distribution in the sense of the previous subsection. Usually, \((X, Y)\) has a probability density function \(f : S \times T \to [0, \infty) \) in the following sense: \[ \P\left[(X, Y) \in A \times B\right] = \sum_{x \in A} \int_B f(x, y) \, dy, \quad A \times B \subseteq S \times T \] More generally, recall that for \(C \subseteq S \times T\) and \(x \in S\), the cross section of \(C\) at \(x\) is \(C_x = \{y \in T: (x, y) \in C\}\).

Suppose that \( (X, Y) \) has probability density function \( f \) in the sense given above. Then \[ \P\left[(X, Y) \in C\right] = \sum_{x \in S} \int_{C_x} f(x, y) \, dy, \quad C \subseteq S \times T \]

Technically, \(f\) is the probability density function of \((X, Y)\) with respect to the product measure on \(S \times T\) formed from counting measure \(\#\) on \(S\) and \(n\)-dimensional measure \(\lambda_n\) on \(T\). For more on this, see the advanced section on absolute continuity and density functions.

Random vectors with mixed coordinates arise naturally in applied problems. For example, the cicada data set has 4 continuous variables and 2 discrete variables. The M&M data set has 6 discrete variables and 1 continuous variable. Vectors with mixed coordinates also occur when a discrete parameter for a continuous distribution is randomized, or when a continuous parameter for a discrete distribution is randomized.

Examples and Applications

Suppose that \(X\) has probability \(\frac{1}{2}\) uniformly distributed on the set \(\{1, 2, \ldots, 8\}\) and has probability \(\frac{1}{2}\) uniformly distributed on the interval \([0, 10]\). Find \(\P(X \gt 6)\).



Suppose that \((X, Y)\) has probability \(\frac{1}{3}\) uniformly distributed on \(\{0, 1, 2\}^2\) and has probability \(\frac{2}{3}\) uniformly distributed on \([0, 2]^2\). Find \(\P(Y \gt X)\).



Suppose that the lifetime \(T\) of a device (in 1000 hour units) has the exponential distribution with probability density function \(f(t) = e^{-t}\) for \(0 \le t \lt \infty\). A test of the device is terminated after 2000 hours; the truncated lifetime \(U\) is recorded. Find each of the following:

  1. \(\P(U \lt 1)\)
  2. \(\P(U = 2)\)
  1. \(1 - e^{-1} \approx 0.6321\)
  2. \(e^{-2} \approx 0.1353\)

Let \[ f(x, y) = \begin{cases} \frac{1}{3}, & x = 1, \, 0 \le y \le 1 \\ \frac{1}{6}, & x = 2, \, 0 \le y \le 2 \\ \frac{1}{9}, & x = 3, \, 0 \le y \le 3 \end{cases} \]

  1. Show that \(f\) is a mixed probability density function in the sense defined above, with \(S = \{1, 2, 3\}\) and \(T = [0, 3]\).
  2. Find \(\P(X \gt 1, Y \lt 1)\).
  1. Clearly \( f(x, y) \gt 0 \) for \( (x, y) \in \{1, 2, 3\} \times [0, 3] \). Moreover, \[ \int_0^3 f(1, y) \, dy + \int_0^3 f(2, y) \, dy + \int_0^3 f(3, y) \, dy = 1 \]
  2. \(\frac{5}{18}\)

Let \(f(p, k) = 6 \binom{3}{k} p^{k + 1} (1 - p)^{4 - k}\) for \((p, k) \in [0, 1] \times \{0, 1, 2, 3\}\).

  1. Show that \(f\) is a mixed probability density function in the sense defined above.
  2. Find \(\P\left(V \lt \frac{1}{2}, X = 2\right)\) where \((V, X)\) is a random vector with probability density function \(f\).
  1. Clearly \( f(p, k) \ge 0 \) for \( (p, k) \in [0, 1] \times \{0, 1, 2, 3\} \). Moreover, \[ \sum_{k=0}^3 \int_0^1 f(p, k) \, dp = 6 \sum_{k=0}^3 \binom{3}{k} \int_0^1 p^{k+1} (1 - p)^k \, dp = 6 \left(1 \cdot \frac{1}{30} + 3 \cdot \frac{1}{60} + 3 \cdot \frac{1}{60} + 1 \cdot \frac{1}{30}\right) = 1 \]
  2. \(\P\left(V \lt \frac{1}{2}, X = 2\right) = 6 \binom{3}{2} \int_0^{1/2} p^3 (1 - p)^2 \, dp = \frac{33}{320} \approx 0.1031\)

As we will see in the section on conditional distributions, the distribution in the last exercise models the following experiment: a random probability \(P\) is selected, and then a coin with this probability of heads is tossed 3 times; \(X\) is the number of heads.

For the M&M data, let \(N\) denote the total number of candies and \(W\) the net weight (in grams). Construct an empirical density function for \((N, W)\).