\(\newcommand{\P}{\mathbb{P}}\) \(\newcommand{\R}{\mathbb{R}}\) \(\newcommand{\E}{\mathbb{E}}\) \(\newcommand{\N}{\mathbb{N}}\) \(\newcommand{\var}{\text{var}}\) \(\newcommand{\sd}{\text{sd}}\) \(\newcommand{\skw}{\text{skew}}\) \(\newcommand{\kur}{\text{kurt}}\)
  1. Random
  2. 4. Special Distributions
  3. Location-Scale Families

Location-Scale Families

General Theory

Definition

Suppose that \(Z\) is a fixed random variable taking values in \(\R\). For \(a \in \R\) and \(b \in (0, \infty) \), let \(X = a + b \, Z\). The two-parameter family of distributions associated with \(X\) is called the location-scale family associated with the given distribution of \(Z\); \(a\) is called the location parameter and \(b\) the scale parameter.

Thus a linear transformation, with positive slope, of the underlying random variable \(Z\) creates a location-scale family for the underlying distribution. In the special case that \(b = 1\), the one-parameter family is called the location family associated with the given distribution, and in the special case that \(a = 0\), the one-parameter family is called the scale family associated with the given distribution. Scale transformations, as the name suggests, occur naturally when physical units are changed. For example, if a random variable represents the length of an object, then a change of units from meters to inches corresponds to a scale transformation. Location-scale transformations can also occur with a change of physical units. For example, if a random variable represents the temperature of an object, then a change of units from Fahrenheit to Celsius corresponds to a location-scale transformation.

Distribution Functions

Our goal is to relate various functions that determine the distribution of \( X = a + b Z \) to the corresponding functions for \( Z \). First we consider the (cumulative) distribution function.

If \(Z\) has distribution function \(G\) then \(X\) has distribution function \(F\) given by \[ F(x) = G \left( \frac{x - a}{b} \right), \quad x \in \R\]

Proof:

For \( x \in \R \) \[ F(x) = \P(X \le x) = \P(a + b Z \le x) = \P\left(Z \le \frac{x - a}{b}\right) = G\left(\frac{x - a}{b}\right) \]

Next we consider the probability density function. The results are a bit different for discrete distributions and continuous distribution, not surprising since the density function has different meanings in these two cases.

If \( Z \) has a discrete distribution with probability density function \( g \) then \( X \) also has a discrete distribution, with probability density function \( f \) given by \[ f(x) = g\left(\frac{x - a}{b}\right), \quad x \in \R \]

Proof:

\( Z \) takes values in a countable subset \( S \subset \R \) and hence \( X \) takes values in \( T = \{a + b z: z \in S\} \), which is also countable. also \[ f(x) = \P(X = x) = \P\left(Z = \frac{x - a}{b}\right) = g\left(\frac{x - a}{b}\right), \quad x \in \R \]

If \(Z\) has a continuous distribution with probability density function \(g\), then \(X\) also has a continuous distribution, with probability density function \(f\) given by

\[ f(x) = \frac{1}{b} \, g \left( \frac{x - a}{b} \right), \quad x \in \R\]
  1. For the location family associated with \(g\), the graph of \(f\) is obtained by shifting the graph of \(g\), \(a\) units to the right if \(a \gt 0\) and \(-a\) units to the left if \(a \lt 0\).
  2. For the scale family associated with \(g\), if \(b \gt 1\), the graph of \(f\) is obtained from the graph of \(g\) by stretching horizontally and compressing vertically, by a factor of \(b\). If \(0 \lt b \lt 1\), the graph of \(f\) is obtained from the graph of \(g\) by compressing horizontally and stretching vertically, by a factor of \(b\).
Proof:

First note that \( \P(X = x) = \P\left(Z = \frac{x - a}{b}\right) = 0 \), so \( X \) has a continuous distribution. The formula for the density function follows by taking derivatives of the the distribution functions , since \( f = F^\prime \) and \( g = G^\prime \)

If \(Z\) has a mode at \(z\), then \(X\) has a mode at \(x = a + b z\).

Proof:

This follows from the previous results in the discrete case and in the continuous case. If \( g \) has a maximum at \( z \) then \( f \) has a maximum at \( x = a + b z \)

Next we relate the quantile functions of \(Z\) and \(X\).

If \(G\) and \(F\) are the distribution functions of \(Z\) and \(X\), respectively, then

  1. \(F^{-1}(p) = a + b \, G^{-1}(p)\) for \(p \in (0, 1)\)
  2. If \(z\) is a quantile of order \(p\) for \(Z\) then \(x = a + b \, z\) is a quantile of order \(p\) for \(X\).
Proof:

These results follow from the results for distribution functions above.

Suppose now that \( Z \) has a continuous distribution on \([0, \infty)\), and that we think of \(Z\) as the failure time of a device (or the time of death of an organism). Let \(X = b Z\) where \( b \in [0, \infty)\), so that the distribution of \(X\) is the scale family associated with the distribution of \(Z\). \(X\) also has a continuous distribution on \([0, \infty)\) and can also be thought of as the failure time of a device (perhaps in different units).

Let \(\bar{G}\) and \(\bar{F}\) denote the reliability functions of \(Z\) and \(X\) respectively, and let \(r\) and \(R\) denote the failure rate functions of \(Z\) and \(X\), respectively. Then

  1. \(\bar{F}(x) = \bar{G}(x/b)\) for \(x \in [0, \infty)\)
  2. \(R(x) = \frac{1}{b} r\left(\frac{x}{b}\right)\) for \(x \in [0, \infty)\)
Proof:

Recall that \( \bar{G} = 1 - G \), \( \bar{F} = 1 - F \), \( r = g / \bar{G} \), and \( R = f / \bar{F} \). Thus the results follow from the results above for distribution functions and density functions.

Moments

The following theorem relates the mean, variance, and standard deviation of \(Z\) and \(X\).

As before, suppose that \(X = a + b \, Z\). Then

  1. \(\E(X) = a + b \, \E(Z)\)
  2. \(\var(X) = b^2 \, \var(Z)\)
  3. \(\sd(X) = b \, \sd(Z)\)
Proof:

These result follow immediately from basic properties of expected value and variance.

Recall that the standard score of a random variable is obtained by subtracting the mean and dividing by the standard deviation. The standard score is dimensionless (that is, has no physical units) and measures the distance from the mean to the random variable in standard deviations. Since location-scale familes essentially correspond to a change of units, it's not surprising that the standard score is unchanged by a location-scale transformation.

The standard scores of \(X\) and \(Z\) are the same:

\[ \frac{X - \E(X)}{\sd(X)} = \frac{Z - \E(Z)}{\sd(Z)} \]
Proof:

From the previous theorem,

\[ \frac{X - \E(X)}{\sd(X)} = \frac{a + b Z - [a + b \E(Z)]}{b \sd(Z)} = \frac{Z - \E(Z)}{\sd(Z)} \]

Recall that the skewness and kurtosis of a random variable are the third and fourth moments, respectively, of the standard score. Thus it follows from the previous result that skewness and kurtosis are unchanged by location-scale transformations: \(\skw(X) = \skw(Z)\), \(\kur(X) = \kur(Z)\).

We can represent the moments of \( X \) (about 0) to those of \( Z \) by means of the binomial theorem: \[ \E\left(X^n\right) = \sum_{k=0}^n \binom{n}{k} b^k a^{n - k} \E\left(Z^k\right), \quad n \in \N \] Of course, the moments of \( X \) about the location parameter \( a \) have a simple representation in terms of the moments of \( Z \) about 0: \[ \E\left[(X - a)^n\right] = b^n \E\left(Z^n\right), \quad n \in \N \] The following exercise relates the moment generating functions of \(Z\) and \(X\).

If \(Z\) has moment generating function \(m\) then \(X\) has moment generating function \(M\) given by

\[ M(t) = e^{a t} m(b t) \]
Proof: \[ M(t) = \E\left(e^{tX}\right) = \E\left[e^{t(a + bZ)}\right] = e^{ta} \E\left(e^{t b Z}\right) = e^{a t} m(b t) \]

Type

As we noted earlier, two probability distributions that are related by a location-scale transformation can be thought of as governing the same underlying random quantity, but in different physical units. This relationship is important enough to deserve a name.

Suppose that \( P \) and \( Q \) are probability distributions on \( \R \) with distribution functions \(F\) and \(G\), respectively. Then \( P \) and \( Q \) are of the same type if there exist constants \(a \in \R\) and \(b \in (0, \infty)\) such that \[ F(x) = G \left( \frac{x - a}{b} \right), \quad x \in \R \]

Being of the same type is an equivalence relation on the collection of probability distributions on \(\R\). That is, if \(P\), \(Q\), and \(R\) are probability distribution on \( \R \) then

  1. \(P\) is the same type as \(P\) (the reflexive property).
  2. If \(P\) is the same type as \(Q\) then \(Q\) is the same type as \(P\) (the symmetric property).
  3. If \(P\) is the same type as \(Q\), and \(Q\) is the same type as \(R\), then \(P\) is the same type as \(R\) (the transitive property).
Proof:

Let \( F \), \( G \), and \( H \) denote the distribution functions of \( P \), \( Q \), and \( R \) respectively.

  1. This is trivial, of course, since we can take \( a = 0 \) and \( b = 1 \).
  2. Suppose there exists \( a \in \R \) and \( b \in (0, \infty) \) such that \( F(x) = G\left(\frac{x - a}{b}\right) \) for \( x \in \R \). Then \( G(x) = F(a + b x) = F\left(\frac{x - (-a/b)}{1/b}\right) \) for \( x \in \R \).
  3. Suppose there exists \( a, \, c \in \R \) and \( b, \, d \in (0, \infty) \) such that \( F(x) = G\left(\frac{x - a}{b}\right) \) and \( G(x) = H\left(\frac{x - c}{d}\right) \) for \( x \in \R \). Then \( F(x) = H\left(\frac{x - (a + bc)}{bd}\right)\) for \( x \in \R \).

So, the collection of probability distributions on \( \R \) is partitioned into mutually exclusive equivalence classes, where the distributions in each class are all of the same type.

Examples and Applications

Special Distributions

Many of the special parametric families of distributions studied in this chapter and elsewhere in this text are location and/or scale families.