\(\newcommand{\R}{\mathbb{R}}\) \(\newcommand{\N}{\mathbb{N}}\) \(\newcommand{\Z}{\mathbb{Z}}\) \(\newcommand{\E}{\mathbb{E}}\) \(\newcommand{\P}{\mathbb{P}}\) \(\newcommand{\var}{\text{var}}\) \(\newcommand{\sd}{\text{sd}}\) \(\newcommand{\cov}{\text{cov}}\) \(\newcommand{\cor}{\text{cor}}\) \(\newcommand{\bias}{\text{bias}}\) \(\newcommand{\mse}{\text{mse}}\) \(\newcommand{\bs}{\boldsymbol}\)
  1. Random
  2. 6. Point Estimation
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6

2. The Method of Moments

Basic Theory

The Method

Suppose that we have a basic random experiment with an observable, real-valued random variable \(X\). The distribution of \(X\) has \(k\) unknown parameters, or equivalently, a parameter vector \(\bs{\theta} = (\theta_1, \theta_2, \ldots, \theta_k)\) taking values in a parameter space, a subset of \( \R^k \). As usual, we repeat the experiment \(n\) times to generate a random sample of size \(n\) from the distribution of \(X\). \[ \bs{X} = (X_1, X_2, \ldots, X_n) \] Thus, \(\bs{X}\) is a sequence of independent random variables, each with the distribution of \(X\). The method of moments is a technique for constructing estimators of the parameters that is based on matching the sample moments with the corresponding distribution moments. First, let \[ \mu_j(\bs{\theta}) = \E\left(X^j\right), \quad j \in \{1, 2, \ldots, k\} \] so that \(\mu_j(\bs{\theta})\) is the \(j\)th moment of \(X\) about 0. Note that we are emphasizing the dependence of these moments on the vector of parameters \(\bs{\theta}\). Note also that \(\mu_1(\bs{\theta})\) is just the mean of \(X\), which we usually denote simply by \(\mu\). Next, let \[ M_j(\bs{X}) = \frac{1}{n} \sum_{i=1}^n X_i^j, \quad j \in \{1, 2, \ldots, k\} \] so that \(M_j(\bs{X})\) is the \(j\)th sample moment about 0. Equivalently, \(M_j(\bs{X})\) is the sample mean for the random sample \(\left(X_1^j, X_2^j, \ldots, X_n^j\right)\) from the distribution of \(X^j\). Note that we are emphasizing the dependence of the sample moments on the sample \(\bs{X}\). Note also that \(M_1(\bs{X})\) is just the ordinary sample mean, which we usually just denote by \(M\). From our previous work, we know that \(M_j(\bs{X})\) is an unbiased and consistent estimator of \(\mu_j(\bs{\theta})\) for each \(j\). Here's how the method works:

To construct the method of moments estimators \(\left(W_1, W_2, \ldots, W_k\right)\) for the parameters \((\theta_1, \theta_2, \ldots, \theta_k)\) respectively, we attempt to solve the set of simultaneous equations \[ \mu_j(W_1, W_2, \ldots, W_k) = M_j(X_1, X_2, \ldots, X_n), \quad j \in \{1, 2, \ldots, k\} \] for \(\left(W_1, W_2, \ldots, W_k\right)\) in terms of \(\left(M_1, M_2, \ldots, M_k\right)\).

Note that we have \(k\) equations in \(k\) unknowns, so there is hope (but no guarantee) that the equations can be solved. In fact, sometimes we need more equations than the number of parameters.

Estimates for the Mean and Variance

Estimating the mean and variance of a distribution are the simplest applications of the method of moments. Throughout this subsection, we assume that we have a random sample \( \bs{X} = (X_1, X_2, \ldots, X_n) \) of size \( n \) from the distribution of a basic random variable \( X \) with \( \mu = \E(X) \in \R \) and \( \sigma^2 = \var(X) \in (0, \infty) \). Occasionally we will also need \( \sigma_4 = \E[(X - \mu)^4] \), the fourth central moment. We start by estimating the mean, which is essentially trivial by this method.

Suppose that the mean \(\mu\) is unknown. The method of moments estimator of \( \mu \) is the sample mean \[ M = \frac{1}{n} \sum_{i=1}^n X_i\]

  1. \( \E(M) = \mu \) so \( M \) is unbiased.
  2. \( \var(M) = \sigma^2/n \) so \( M \) is consistent.
Proof:

It does not get any more basic than this. The method of moments works by matching the distribution mean with the sample mean. The fact that \( \E(M) = \mu \) and \( \var(M) = \sigma^2 / n \) are properties that we have seen several times before.

Estimating the variance of a distribution, on the other hand, depends on whether the mean is known or unknown. First we will consider the more realistic case when the mean in also unknown.

Suppose that the mean \( \mu \) and the variance \( \sigma^2 \) are both unknown. The method of moments estimator of \(\sigma^2\) is \[T^2 = \frac{1}{n} \sum_{i=1}^n (X_i - M)^2\]

Proof:

As before, the method of moments estimator of the distribution mean \(\mu\) is the sample mean \(M\). On the other hand, \(\sigma^2 = \mu_2 - \mu^2\) and hence the method of moments estimator of \(\sigma^2\) is \(T^2 = M_2 - M^2\), which simplifies to the result above.

Note that \(T^2 = \frac{n-1}{n} S^2\) where \(S^2\) is the usual sample variance: \[ S^2 = \frac{1}{n - 1} \sum_{i=1}^n (X_i - M)^2 \] So let's compare the estimators \(S^2\) and \(T^2\). Recall that \(S^2\) is unbiased and consistent, with \(\var(S^2) = \frac{1}{n} \left(\sigma_4 - \frac{n - 3}{n - 1} \sigma^4 \right)\).

\(T^2\) satisfies the following bias properties:

  1. \(\bias(T^2) = -\sigma^2 / n\), so \(T^2\) is negatively biased and on average underestimates \(\sigma^2\).
  2. \(T^2\) is asymptotically unbiased.
Proof:

For part (a), \(\E(T^2) = \frac{n-1}{n} \E(S^2) = \frac{n-1}{n} \sigma^2\), so \(\bias(T^2) = \frac{n-1}{n}\sigma^2 - \sigma^2 = -\frac{1}{n} \sigma^2\). Part (b) follows immediately from part (a).

\(T^2\) satisfies the following mean square error properties:

  1. \(\mse(T^2) = \frac{(n-1)^2}{n^3} \left(\sigma_4 - \frac{n-3}{n-1} \sigma^4 \right) + \frac{\sigma^4}{n^2}\).
  2. \(T^2\) is consistent.
  3. The asymptotic relative efficiency of \(T^2\) to \(S^2\) is 1.
Proof:

For part (a), recall that \(\mse(T^2) = \var(T^2) + \bias^2(T^2)\). But \(\var(T^2) = \left(\frac{n-1}{n}\right)^2 \var(S^2)\). The result follows from substituting \(\var(S^2)\) given above and \(\bias(T^2)\) given in the previous theorem. Parts (b) and (c) follow from (a).

Next let's consider the usually unrealistic (but mathematically interesting) case where the mean is known, but not the variance.

Suppose that the mean \( \mu \) is known and the variance \( \sigma^2 \) unknown. The method of moments estimator of \(\sigma^2\) is \[ W^2 = \frac{1}{n} \sum_{i=1}^n (X_i - \mu)^2 \]

  1. \( \E(W^2) = \sigma^2 \) so \( W^2 \) is unbiased.
  2. \(\var(W^2) = \frac{1}{n}(\sigma_4 - \sigma^4)\) so \( W^2 \) is consistent.
Proof:

These results follow since \( ((X_1 - \mu)^2, (X_2 - \mu)^2, \ldots, (X_n - \mu)^2) \) is a random sample of size \( n \) from the distribution of \( (X - \mu)^2 \).

Estimating Two Parameters

There are many important special distributions with two paraemters; some of these are included in the computational exercises below. With two parameters, we can derive the method of moments estimators by matching the distribution mean and variance with the sample mean and variance, rather than matching the distribution mean and second moment with the sample mean and second moment. This alternative approach sometimes leads to easier equations.

Suppose that \( \bs{X} = (X_1, X_2, \ldots, X_n) \) is a random sample of size \( n \) from a distribution with parmeters \( a \) and \( b \). Let \( \mu(a, b) \) and \( \sigma^2(a, b) \) denote the mean and variance of the distribution, and let \( M \) and \( T^2 \) denote the sample mean and sample variance, as defined above. Then the method of moments estimators \( U \) and \( V \) of \( a \) and \( b \), respectively, can be found by solving the equations \[ \mu(U, V) = M, \quad \sigma^2(U, V) = T^2 \]

Proof:

Let \( \mu_2(a, b) = \E(X^2) \) denote the second distribution moment, as defined above in the discussion of the method of moments. Then \( \sigma^2(a, b) = \mu_2(a, b) - \mu^2(a, b) \). In addition, \( T^2 = M_2 - M^2 \) where \( M_2 = \frac{1}{n} \sum_{i=1}^n X_i^2 \) is the second sample moment. Hence the equations \( \mu(U, V) = M \), \( \sigma^2(U, V) = T^2 \) are equivalent to the equations \( \mu(U, V) = M \), \( \mu_2(U, V) = M_2 \), which are by definition the equations that define the method of moments estimators.

Because of this result, the statistic \( T^2 \) will appear in many of the estimation problems for special distributions that we consider next.

Special Distributions

The Normal Distribution

The normal distribution with mean \( \mu \in \R \) and variance \( \sigma^2 \in (0, \infty) \) is a continuous distribution on \( \R \) with probability density function \( g \) given by \[ g(x) = \frac{1}{\sqrt{2 \pi} \sigma} e^{-\frac{1}{2}\left(\frac{x - \mu}{\sigma}\right)^2}, \quad x \in \R \] This is one of the most important distributions in probability and statistics, primarily because of the central limit theorem. The normal distribution is studied in more detail in the chapter on Special Distributions.

Suppose now that \( \bs{X} = (X_1, X_2, \ldots, X_n) \) is a random sample of size \( n \) from the normal distribution with mean \( \mu \) and variance \( \sigma^2 \). Form our general work above, we know that if \( \mu \) is unknown then the sample mean \( M \) is the method of moments estimator of \( \mu \), and if in addition, \( \sigma^2 \) is unknown then the method of moments estimator of \( \sigma^2 \) is \( T^2 \). On the other hand, in the unlikely event that \( \mu \) is known then \( W^2 \) is the method of moments estimator of \( \sigma^2 \). Our goal is to see how the comparisons above simplify for the normal distribution.

Mean square errors of \( S^2 \) and \( T^2 \).

  1. \(\mse(T^2) = \frac{2 n - 1}{n^2} \sigma^4\)
  2. \(\mse(S^2) = \frac{2}{n - 1} \sigma^4\)
  3. \(\mse(T^2) \lt \mse(S^2)\) for \(n \in \{2, 3, \ldots, \}\)
Proof:

Recall that for the normal distribution, \(\sigma_4 = 3 \sigma^4\). Substituting this into the general results gives parts (a) and (b). Part (c) follows from (a) and (b). Of course the asymptotic relative efficiency is still 1, from our previous theorem.

Thus, \(S^2\) and \(T^2\) are multiplies of one another; \(S^2\) is unbiased, but when the sampling distribution is normal, \(T^2\) has smaller mean square error. Surprisingly, \(T^2\) has smaller mean square error even than \(W^2\).

Mean square errors of \( T^2 \) and \( W^2 \).

  1. \(\mse(W^2) = \frac{2}{n} \sigma^4\)
  2. \(\mse(T^2) \lt \mse(W^2)\) for \(n \in \{2, 3, \ldots\}\)
Proof:

Again, since the sampling distribution is normal, \(\sigma_4 = 3 \sigma^4\). Substituting this into the gneral formula for \(\var(W^2)\) gives part (a).

Run the normal estimation experiment 1000 times for several values of the sample size \(n\) and the parameters \(\mu\) and \(\sigma\). Compare the empirical bias and mean square error of \(S^2\) and of \(T^2\) to their theoretical values. Which estimator is better in terms of bias? Which estimator is better in terms of mean square error?

The Bernoulli Distribution

Recall that an indicator variable is a random variable \( X \) that takes only the values 0 and 1. The distribution of \( X \) is known as the Bernoulli distribution, named for Jacob Bernoulli, and has probability density function \( g \) given by \[ g(x) = p^x (1 - p)^{1 - x}, \quad x \in \{0, 1\} \] where \( p \in (0, 1) \) is the success parameter. The mean of the distribution is \( p \) and the variance is \( p (1 - p) \).

Suppose now that \( \bs{X} = (X_1, X_2, \ldots, X_n) \) is a random sample of size \( n \) from the Bernoulli distribution with unknown success parameter \( p \). Since the mean of the distribution is \( p \), it follows from our general work above that the method of moments estimator of \( p \) is \( M \), the sample mean. In this case, the sample \( \bs{X} \) is a sequence of Bernoulli trials, and \( M \) has a scaled version of the binomial distribution with parameters \( n \) and \( p \): \[ \P\left(M = \frac{k}{n}\right) = \binom{n}{k} p^k (1 - p)^{n - k}, \quad k \in \{0, 1, \ldots, n\} \] Note that since \( X^k = X \) for every \( k \in \N_+ \), it follows that \( \mu_k = p \) and \( M_k = M \) for every \( k \in \N_+ \). So any of the method of moments equations would lead to the sample mean \( M \) as the estimator of \( p \). Although very simple, this is an important application, since Bernoulli trials are found embedded in all sorts of estimation problems, such as empirical probability density functions and empirical distribution functions.

The Geometric Distribution

The geometric distribution on \(\N_+\) with success parameter \(p \in (0, 1)\) has probability density function \( g \) given by \[ g(x) = p (1 - p)^{x-1}, \quad x \in \N_+ \] The geometric distribution on \( \N_+ \) governs the number of trials needed to get the first success in a sequence of Bernoulli trials with success parameter \( p \). The mean of the distribution is \(\mu = 1 / p\).

Suppose that \(\bs{X} = (X_1, X_2, \ldots, X_n)\) is a random sample of size \(n\) from the geometric distribution on \( \N_+ \) with unknown success parameter \(p\). The method of moments estimator of \(p\) is \[U = \frac{1}{M}\]

Proof:

The method of moments equation for \(U\) is \(1 / U = M\).

The geometric distribution on \( \N \) with success parameter \( p \in (0, 1) \) has probability density function \[ g(x) = p (1 - p)^x, \quad x \in \N \] This version of the geometric distribution governs the number of failures before the first success in a sequence of Bernoulli trials. The mean of the distribution is \( \mu = (1 - p) \big/ p \).

Suppose that \(\bs{X} = (X_1, X_2, \ldots, X_n)\) is a random sample of size \(n\) from the geometric distribution on \( \N \) with unknown parameter \(p\). The method of moments estimator of \(p\) is \[U = \frac{1}{M + 1}\]

Proof:

The method of moments equation for \(U\) is \((1 - U) \big/ U = M\).

The Negative Binomial Distribution

More generally, the negative binomial distribution on \( \N \) with shape parameter \( k \in (0, \infty) \) and success parameter \( p \in (0, 1) \) has probability density function \[ g(x) = \binom{x + k - 1}{k - 1} p^k (1 - p)^x, \quad x \in \N \] If \( k \) is a positive integer, then this distribution governs the number of failures before the \( k \)th success in a sequence of Bernoulli trials with success parameter \( p \). However, the distribution makes sense for general \( k \in (0, \infty) \). The negative binomial distribution is studied in more detail in the chapter on Bernoulli Trials. The mean of the distribution is \( k (1 - p) \big/ p \) and the variance is \( k (1 - p) \big/ p^2 \). Suppose now that \(\bs{X} = (X_1, X_2, \ldots, X_n)\) is a random sample of size \(n\) from the negative binomial distribution on \( \N \) with shape parameter \( k \) and success parameter \( p \)

If \( k \) and \( p \) are unknown, then the corresponding method of moments estimators \( U \) and \( V \) are \[ U = \frac{M^2}{T^2 - M}, \quad V = \frac{M}{T^2} \]

Proof:

Matching the distribution mean and variance to the sample mean and variance gives the equations \[ U \frac{1 - V}{V} = M, \quad U \frac{1 - V}{V^2} = T^2 \]

As usual, the results are nicer when one of the parameters is known.

Suppose that \( k \) is known but \( p \) is unknown. The method of moments estimator \( V_k \) of \( p \) is \[ V_k = \frac{k}{M + k} \]

Proof:

Matching the distribution mean to the sample mean gives the equation \[ k \frac{1 - V_k}{V_k} = M \]

Suppose that \( k \) is unknown but \( p \) is known. The method of moments estimator of \( k \) is \[ U_p = \frac{p}{1 - p} M \]

  1. \( \E(U_p) = k \) so \( U_p \) is unbiased.
  2. \( \var(U_p) = \frac{k}{n (1 - p)} \) so \( U_p \) is consistent.
Proof:

Matching the distribution mean to the sample mean gives the equation \( U_p \frac{1 - p}{p} = M\).

  1. \( E(U_p) = \frac{p}{1 - p} \E(M)\) and \(\E(M) = \frac{1 - p}{p} k\)
  2. \( \var(U_p) = \left(\frac{p}{1 - p}\right)^2 \var(M) \) and \( \var(M) = \frac{1}{n} \var(X) = \frac{1 - p}{n p^2} \)

The Poisson Distribution

The Poisson distribution with parameter \( r \in (0, \infty) \) is a discrete distribution on \( \N \) with probability density function \( g \) given by \[ g(x) = e^{-r} \frac{r^x}{x!}, \quad x \in \N \] The mean and variance are both \( r \). The distribution is named for Simeon Poisson and is widely used to model the number of random points is a region of time or space. The parameter \( r \) is proportional to the size of the region, with the proportionality constant playing the role of the average rate at which the points are distributed in time or space. The Poisson distribution is studied in more detail in the chapter on the Poisson Process.

Suppose now that \( \bs{X} = (X_1, X_2, \ldots, X_n) \) is a random sample of size \( n \) from the Poisson distribution with parameter \( r \). Since \( r \) is the mean, it follows from our general work above that the method of moments estimator of \( r \) is the sample mean \( M \).

The Gamma Distribution

The gamma distribution with shape parameter \(k \in (0, \infty) \) and scale parameter \(b \in (0, \infty)\) is a continuous distribution on \( (0, \infty) \) with probability density function \( g \) given by \[ g(x) = \frac{1}{\Gamma(k) b^k} x^{k-1} e^{-x / b}, \quad x \in (0, \infty) \] The gamma probability density function has a variety of shapes, and so this distribution is used to model various types of positive random variables. The gamma distribution is studied in more detail in the chapter on Special Distributions. The mean is \(\mu = k b\) and the variance is \(\sigma^2 = k b^2\).

Suppose now that \(\bs{X} = (X_1, X_2, \ldots, X_n)\) is a random sample from the gamma distribution with shape parameter \(k\) and scale parameter \(b\).

Suppose that \(k\) and \(b\) are both unknown, and let \(U\) and \(V\) be the corresponding method of moments estimators. Then \[ U = \frac{M^2}{T^2}, \quad V = \frac{T^2}{M}\]

Proof:

Matching the distribution mean and variance with the sample mean and variance leads to the equations \(U V = M\), \(U V^2 = T^2\). Solving gives the results.

The method of moments estimators of \(k\) and \(b\) given in the previous exercise are complicated, nonlinear functions of the sample mean \(M\) and the sample variance \(T^2\). Thus, computing the bias and mean square errors of these estimators are difficult problems that we will not attempt. However, we can judge the quality of the estimators empirically, through simulations.

When one of the parameters is known, the method of moments estimator of the other parameter is much simpler.

Suppose that \(k\) is unknown, but \(b\) is known. The method of moments estimator of \( k \) is \[U_b = \frac{M}{b}\]

  1. \( \E(U_b) = k \) so \(U_b\) is unbiased.
  2. \(\var(U_b) = k / n\) so \(U_b\) is consistent.
Proof:

If \(b\) is known, then the method of moments equation for \(U_b\) is \(b U_b = M\). Solving gives (a). Next, \(\E(U_b) = \E(M) / b = k b / b = k\), so \(U_b\) is unbiased. Finally \(\var(U_b) = \var(M) / b^2 = k b ^2 / (n b^2) = k / n\).

Suppose that \(b\) is unknown, but \(k\) is known. The method of moments estimator of \(b\) is \[V_k = \frac{M}{k}\]

  1. \( \E(V_k) = b \) so \(V_k\) is unbiased.
  2. \( \var(V_k) = b^2 / k n \) so that \(V_k\) is consistent.
Proof:

If \(k\) is known, then the method of moments equation for \(V_k\) is \(k V_k = M\). Solving gives (a). Next, \(\E(V_k) = \E(M) / k = k b / k = b\), so \(V_k\) is unbiased. Finally \(\var(V_k) = \var(M) / k^2 = k b ^2 / (n k^2) = b^2 / k n\).

Run the gamma estimation experiment 1000 times for several different values of the sample size \(n\) and the parameters \(k\) and \(b\). Note the empirical bias and mean square error of the estimators \(U\), \(V\), \(U_b\), and \(V_k\). One would think that the estimators when one of the parameters is known should work better than the corresponding estimators when both parameters are unknown; but investigate this question empirically.

The Beta Distribution

The beta distribution with left parameter \(a \in (0, \infty) \) and right parameter \(b \in (0, \infty)\) is a continuous distribution on \( (0, 1) \) with probability density function \( g \) given by \[ g(x) = \frac{1}{B(a, b)} x^{a-1} (1 - x)^{b-1}, \quad 0 \lt x \lt 1 \] The beta probability density function has a variety of shapes, and so this distribution is widely used to model various types of random variables that take values in bounded intervals. The beta distribution is studied in more detail in the chapter on Special Distributions. The first two moments are \(\mu = \frac{a}{a + b}\) and \(\mu_2 = \frac{a (a + 1)}{(a + b)(a + b + 1)}\).

Suppose now that \(\bs{X} = (X_1, X_2, \ldots, X_n)\) is a random sample of size \(n\) from the beta distribution with left parameter \(a\) and right parameter \(b\).

Suppose that \(a\) and \(b\) are both unknown, and let \(U\) and \(V\) be the corresponding method of moments estimators. Then \[U = \frac{M(M - M_2)}{M_2 - M^2}, \quad V = \frac{(1 - M)(M - M_2)}{M_2 - M^2}\]

Proof:

The method of moments equations for \(U\) and \(V\) are \[\frac{U}{U + V} = M, \quad \frac{U(U + 1)}{(U + V)(U + V + 1)} = M_2\] Solving gives the result.

The method of moments estimators of \(a\) and \(b\) given in the previous exercise are complicated nonlinear functions of the sample moments \(M\) and \(M_2\). Thus, we will not attempt to determine the bias and mean square errors analytically, but you will have an opportunity to explore them empricially through a simulation.

Suppose that \(a\) is unknown, but \(b\) is known. Let \(U_b\) be the method of moments estimator of \(a\). Then \[ U_b = b \frac{M}{1 - M} \]

Proof:

If \(b\) is known then the method of moments equation for \(U_b\) as an estimator of \(a\) is \(U_b \big/ (U_b + b) = M\). Solving for \(U_b\) gives the result.

Suppose that \(b\) is unknown, but \(a\) is known. Let \(V_a\) be the method of moments estimator of \(b\). Then \[ V_a = a \frac{1 - M}{M} \]

Proof:

If \(a\) is known then the method of moments equation for \(V_a\) as an estimator of \(b\) is \(a \big/ (a + V_a) = M\). Solving for \(V_a\) gives the result.

Run the beta estimation experiment 1000 times for several different values of the sample size \(n\) and the parameters \(a\) and \(b\). Note the empirical bias and mean square error of the estimators \(U\), \(V\), \(U_b\), and \(V_a\). One would think that the estimators when one of the parameters is known should work better than the corresponding estimators when both parameters are unknown; but investigate this question empirically.

The following problem gives a distribution with just one parameter but the second moment equation from the method of moments is needed to derive an estimator.

Suppose that \( \bs{X} = (X_1, X_2, \ldots, X_n) \) is a random sample from the symmetric beta distribution, in which the left and right parameters are equal to an unknown value \( c \in (0, \infty) \). The method of moments estimator of \( c \) is \[ U = \frac{2 M_2}{1 - 4 M_2} \]

Proof:

Note that the mean \( \mu \) of the symmetric distribution is \( \frac{1}{2} \), independently of \( c \), and so the first equation in the method of moments is useless. However, matching the second distribution moment to the second sample moment leads to the equation \[ \frac{U + 1}{2 (2 U + 1)} = M_2 \] Solving gives the result.

The Pareto Distribution

The Pareto distribution with shape parameter \(a \in (0, \infty)\) and scale parameter \(b \in (0, \infty)\) is a continuous distribution on \( (b, \infty) \) with probability density function \( g \) given by \[ g(x) = \frac{a b^a}{x^{a + 1}}, \quad b \le x \lt \infty \] The Pareto distribution is named for Vilfredo Pareto and is a highly skewed and heavy-tailed distribution. It is often used to model income and certain other types of positive random variables. The Pareto distribution is studied in more detail in the chapter on Special Distributions. If \(a \gt 2\), the first two moments of the Pareto distribution are \(\mu = \frac{a b}{a - 1}\) and \(\mu_2 = \frac{a b^2}{a - 2}\).

Suppose now that \(\bs{X} = (X_1, X_2, \ldots, X_n)\) is a random sample of size \(n\) from the Pareto distribution with shape parameter \(a \gt 2\) and scale parameter \(b \gt 0\).

Suppose that \(a\) and \(b\) are both unknown, and let \(U\) and \(V\) be the corresponding method of moments estimators. Then \begin{align} U & = 1 + \sqrt{\frac{M_2}{M_2 - M^2}} \\ V & = \frac{M_2}{M} \left( 1 - \sqrt{\frac{M_2 - M^2}{M_2}} \right) \end{align}

Proof:

The method of moments equations for \(U\) and \(V\) are \begin{align} \frac{U V}{U - 1} & = M \\ \frac{U V^2}{U - 2} & = M_2 \end{align} Solving for \(U\) and \(V\) gives the results.

As with our previous examples, the method of moments estimators are complicatd nonlinear functions of \(M\) and \(M_2\), so computing the bias and mean square error of the estimator is difficult. Instead, we can investigate the bias and mean square error empirically, through a simulation.

Run the Pareto estimation experiment 1000 times for several different values of the sample size \(n\) and the parameters \(a\) and \(b\). Note the empirical bias and mean square error of the estimators \(U\) and \(V\).

When one of the parameters is known, the method of moments estimator for the other parameter is simpler.

Suppose that \(a\) is unknown, but \(b\) is known. Let \(U_b\) be the method of moments estimator of \(a\). Then \[ U_b = \frac{M}{M - b}\]

Proof:

If \(b\) is known then the method of moment equation for \(U_b\) as an estimator of \(a\) is \(b U_b \big/ (U_b - 1) = M\). Solving for \(U_b\) gives the result.

Suppose that \(b\) is unknown, but \(a\) is known. Let \(V_a\) be the method of moments estimator of \(b\). Then \[V_a = \frac{a - 1}{a}M\]

  1. \( \E(V_a) = b \) so \(V_a\) is unbiased.
  2. \(\var(V_a) = \frac{b^2}{n a (a - 2)}\) so \(V_a\) is consistent.
Proof:

If \(a\) is known then the method of moments equation for \(V_a\) as an estimator of \(b\) is \(a V_a \big/ (a - 1) = M\). Solving for \(V_a\) gives (a). Next, \(\E(V_a) = \frac{a - 1}{a} \E(M) = \frac{a - 1}{a} \frac{a b}{a - 1} = b\) so \(V_a\) is unbiased. Finally, \(\var(V_a) = \left(\frac{a - 1}{a}\right)^2 \var(M) = \frac{(a - 1)^2}{a^2} \frac{a b^2}{n (a - 1)^2 (a - 2)} = \frac{b^2}{n a (a - 2)}\).

The Uniform Distribution

The (continuous) uniform distribution with location parameter \( a \in \R \) and scale parameter \( h \in (0, \infty) \) has probability density function \( g \) given by \[ g(x) = \frac{1}{h}, \quad x \in [a, a + h] \] The distribution models a point chosen at random from the interval \( [a, a + h] \). The mean of the distribution is \( \mu = a + \frac{1}{2} h \) and the variance is \( \sigma^2 = \frac{1}{12} h^2 \). The uniform distribution is studied in more detail in the chapter on Special Distributions. Suppose now that \( \bs{X} = (X_1, X_2, \ldots, X_n) \) is a random sample of size \( n \) from the uniform distribution.

Suppose that \( a \) and \( h \) are both unknown, and let \( U \) and \( V \) denote the corresponding method of moments estimators. Then \[ U = 2 M - \sqrt{3} T, \quad V = 2 \sqrt{3} T \]

Proof:

Matching the distribution mean and variance to the sample mean and variance leads to the equations \( U + \frac{1}{2} V = M \) and \( \frac{1}{12} V^2 = T^2 \). Solving gives the result.

As usual, we get nicer results when one of the parameters is known.

Suppose that \( a \) is known and \( h \) is unknown, and let \( V_a \) denote the method of moments estimator of \( h \). Then \[ V_a = 2 (M - a) \]

  1. \( \E(V_a) = h \) so \( V \) is unbiased.
  2. \( \var(V_a) = \frac{h^2}{3 n} \) so \( V_a \) is consistent.
Proof:

Matching the distribution mean to the sample mean leads to the equation \( a + \frac{1}{2} V_a = M \). Solving gives the result.

  1. \( \E(V_a) = 2[\E(M) - a] = 2(a + h/2 - a) = h \)
  2. \( \var(V_a) = 4 \var(M) = \frac{h^2}{3 n} \)

Suppose that \( h \) is known and \( a \) is unknown, and let \( U_h \) denote the method of moments estimator of \( a \). Then \[ U_h = M - \frac{1}{2} h \]

  1. \( \E(U_h) = a \) so \( U_h \) is unbiased.
  2. \( \var(U_h) = \frac{h^2}{12 n} \) so \( U_h \) is consistent.
Proof:

Matching the distribution mean to the sample mean leads to the quation \( U_h + \frac{1}{2} h = M \). Solving gives the result.

  1. \( \E(U_h) = \E(M) - \frac{1}{2}h = a + \frac{1}{2} h - \frac{1}{2} h = a \)
  2. \( \var(U_h) = \var(M) = \frac{h^2}{12 n} \)

The Hypergeometric Model

Our basic assumption in the method of moments is that the sequence of observed random variables \( \bs{X} = (X_1, X_2, \ldots, X_n) \) is a random sample from a distribution. However, the method makes sense, at least in some cases, when the variables are identically distributed but dependent. In the hypergeometric model, we have a population of \( N \) objects with \( r \) of the objects type 1 and the remaining \( N - r \) objects type 0. The parameter \( N \), the population size, is a positive integer. The parameter \( r \), the type 1 size, is a nonnegative integer with \( r \le N \). These are the basic parameters, and typically one or both is unknown. Here are some typical examples:

  1. The objects are devices, classified as good or defective.
  2. The objects are persons, classified as female or male.
  3. The objects are voters, classified as for or against a particular candidate.
  4. The objects are wildlife or a particular type, either tagged or untagged.

We sample \( n \) objects from the population at random, without replacement. Let \( X_i \) be the type of the \( i \)th object selected, so that our sequence of observed variables is \( \bs{X} = (X_1, X_2, \ldots, X_n) \). The variables are identically distributed indicator variables, with \( P(X_i = 1) = r / N \) for each \( i \in \{1, 2, \ldots, n\} \), but are dependent since the sampling is without replacement. The number of type 1 objects in the sample is \( Y = \sum_{i=1}^n X_i \). This statistic has the hypergeometric distribution with parameter \( N \), \( r \), and \( n \), and has probability density function given by \[ P(Y = y) = \frac{\binom{r}{y} \binom{N - r}{n - y}}{\binom{N}{n}} = \binom{n}{y} \frac{r^{(y)} (N - r)^{(n - y)}}{N^{(n)}}, \quad y \in \{\max\{0, N - n + r\}, \ldots, \min\{n, r\}\} \] The hypergeometric model is studied in more detail in the chapter on Finite Sampling Models.

As above, let \( \bs{X} = (X_1, X_2, \ldots, X_n) \) be the observed variables in the hypergeometric model with parameters \( N \) and \( r \). Then

  1. The method of moments estimator of \( p = r / N \) is \( M = Y / n \), the sample mean.
  2. The method of moments estimator of \( r \) with \( N \) known is \( U = N M = N Y / n \).
  3. The method of moments estimator of \( N \) with \( r \) known is \( V = r / M = r n / Y \) if \( Y > 0 \).
Proof:

These results all follow simply from the fact that \( \E(X) = \P(X = 1) = r / N \).

In the voter example (3) above, typically \( N \) and \( r \) are both unknown, but we would only be interested in estimating the ratio \( p = r / N \). In the reliability example (1), we might typically know \( N \) and would be interested in estimating \( r \). In the wildlife example (4), we would typically know \( r \) and would be interested in estimating \( N \). This example is known as the capture-recapture model.

Clearly there is a close relationship between the hypergeometric model and the Bernoulli trials model above. In fact, if the sampling is with replacement, the Bernoulli trials model would apply rather than the hypergeometric model. In addition, if the population size \( N \) is large compared to the sample size \( n \), the hypergeometric model is well approximated by the Bernoulli trials model.