\(\newcommand{\N}{\mathbb{N}}\)
  1. Random
  2. 0. Foundations
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7
  10. 8
  11. 9
  12. 10
  13. 11
  14. 12

7. Counting Measure

Basic Theory

Suppose that \(S\) is a finite set. For \(A \subseteq S\), the cardinality of \(A\) is the number of elements in \(A\), and is denoted \(\#(A)\). The function \(\#\) on \(\mathscr{P}(S)\) is called counting measure. Counting measure plays a fundamental role in discrete probability structures, and particularly those that involve sampling from a finite set. The set \(S\) is typically very large, hence efficient counting methods are essential. The first combinatorial problem is attributed to the Greek mathematician Xenocrates.

In many cases, a set of objects can be counted by establishing a one-to-one correspondence between the given set and some other set. Naturally, the two sets have the same number of elements, but for some reason, the second set may be easier to count.

The Addition Rule

The addition rule of combinatorics is simply the additivity axiom of counting measure.

If \(\{A_1, A_2, \ldots, A_n\}\) is a collection of disjoint subsets of \(S\) then \[ \#\left( \bigcup_{i=1}^n A_i \right) = \sum_{i=1}^n \#(A_i) \]

The addition rule
Union1.png

Simple Counting Rules

The counting rules in this subsection are simple consequences of the addition rule. Be sure to try the proofs yourself before reading the ones in the text.

\(\#(A^c) = \#(S) - \#(A)\). This is the complement rule.

Proof:

Note that \(A\) and \(A^c\) are disjoint and their union is \(S\). Hence \( \#(A) + \#(A^c) = \#(S) \).

The complement rule
The complement of A

\(\#(B \setminus A) = \#(B) - \#(A \cap B)\). This is the difference rule.

Proof:

Note that \(A \cap B\) and \(B \setminus A\) are disjoint and their union is \(B\). Hence \( \#(A \cap B) + \#(B \setminus A) = \#(B) \).

If \(A \subseteq B\) then \(\#(B \setminus A) = \#(B) - \#(A)\).

Proof:

This follows from the difference rule, since \(A \cap B = A\).

If \(A \subseteq B\) then \(\#(A) \le \#(B)\).

Proof:

This follows from the previous result: \( \#(B) = \#(A) + \#(B \setminus A) \ge \#(A) \).

Thus, \(\#\) is an increasing function, relative to the subset partial order \(\subseteq\) on \(\mathscr{P}(S)\), and the ordinary order \(\le\) on \(\N\).

Inequalities

This subsection gives two inequalities that are useful for obtaining bounds on the number of elements in a set. The first is Boole's inequality (named after George Boole) which gives an upper bound on the cardinality of a union.

If \(\{A_1, A_2, \ldots, A_n\}\) is a finite collection of subsets of \(S\) then \[ \#\left( \bigcup_{i=1}^n A_i \right) \le \sum_{i=1}^n \#(A_i) \]

Proof:

Let \(B_1 = A_1\) and \(B_i = A_i \setminus (A_1 \cup \cdots A_{i-1})\) for \(i \in \{2, 3, \ldots, n\}\). Note that \(\{B_1, B_2, \ldots, B_n\}\) is a pairwise disjoint collection and has the same union as \(\{A_1, A_2, \ldots, A_n\}\). From the increasing property, \( \#(B_i) \le \#(A_i) \) for each \( i \in \{1, 2, \ldots, n\} \). Hence by the addition rule, \[ \#\left( \bigcup_{i=1}^n A_i \right) = \#\left(\bigcup_{i=1}^n B_i\right) \le \sum_{i=1}^n \#(A_i) \]

Intuitively, Boole's inequality holds because parts of the union have been counted more than once in the expression on the right. The second inequality is Bonferroni's inequality (named after Carlo Bonferroni), which gives a lower bound on the cardinality of an intersection.

If \(\{A_1, A_2, \ldots, A_n\}\) is a finite collection of subsets of \(S\) then \[ \#\left( \bigcap_{i=1}^n A_i \right) \ge \#(S) - \sum_{i=1}^n [\#(S) - \#(A_i)] \]

Proof:

Using the complement rule, Boole's inequality, and DeMorgan's law, \[ \#\left(\bigcap_{i=1}^n A_i\right) = \#(S) - \#\left(\bigcup_{i=1}^n A_i^c\right) \ge \#(S) - \sum_{i=1}^n \#(A_i^c) = \#(S) - \sum_{i=1}^n [\#(S) - \#(A_i)] \]

The Inclusion-Exclusion Formula

The inclusion-exclusion formula gives the cardinality of a union of sets in terms of the cardinality of the various intersections of the sets. The formula is useful because intersections are often easier to count. We start with the special cases of two sets and three sets. As usual, we assume that the sets are subsets of a finite universal set \( S \).

If \( A \) and \( B \) are subsets of \( S \) then \(\#(A \cup B) = \#(A) + \#(B) - \#(A \cap B)\).

Proof:

Note first that \(A \cup B = A \cup (B \setminus A)\) and the latter two sets are disjoint. From the addition rule and the difference rule, \[ \#(A \cup B) = \#(A) + \#(B \setminus A) = \#(A) + \#(B) - \#(A \cap B) \]

The inclusion-exclusion theorem for two sets
Union of two sets

If \( A \), \( B \), \( C \) are subsets of \( S \) then \(\#(A \cup B \cup C) = \#(A) + \#(B) + \#(C) - \#(A \cap B) - \#(A \cap C) - \#(B \cap C) + \#(A \cap B \cap C)\).

Proof:

Note that \( A \cup B \cup C = (A \cup B) \cup [C \setminus (A \cup B)] \) and that \( A \cup B \) and \( C \setminus (A \cup B) \) are disjoint. Using the addition rule and the difference rule, \[ \#(A \cup B \cup C) = \#(A \cup B) + \#[C \setminus (A \cup B)] = \#(A \cup B) + \#(C) - \#[C \cap (A \cup B)] = \#(A \cup B) + \#(C) - \#[(A \cap C) \cup (B \cap C)]\] Now using the inclusion-exclusion rule for two sets (twice) we have \[ \#(A \cup B \cup C) = \#(A) + \#(B) - \#(A \cap B) + \#(C) - \#(A \cap C) - \#(B \cap C) + \#(A \cap B \cap C) \]

The inclusion-exclusion theorem for three sets
Union of three sets

The inclusion-exclusion rule for two sets and for three sets can be generalized to a union of \(n\) sets; the generalization is known as the (general) inclusion-exclusion formula.

Suppose that \(\{A_i: i \in I\}\) is a collection of subsets of \(S\) where \(I\) is an index set with \(\#(I) = n\). Then \[ \# \left( \bigcup_{i \in I} A_i \right) = \sum_{k = 1}^n (-1)^{k - 1} \sum_{J \subseteq I, \; \#(J) = k} \# \left( \bigcap_{j \in J} A_j \right) \]

Proof:

The proof is by induction on \(n\). The formula holds for \( n = 2 \) events by the result above for two sets, and for \( n = 3 \) events by result above for three sets.. Suppose the formula holds for \( n \in \N_+ \), and suppose that \( \{A_1, A_2, \ldots, A_n, A_{n+1}\} \) is a collection of \( n + 1 \) subsets of \( S \). Then \[ \bigcup_{i=1}^{n+1} A_i = \left(\bigcup_{i=1}^n A_i\right) \cup \left[A_{n+1} \setminus \left(\bigcup_{i=1}^n A_i\right)\right] \] and the two sets connected by the central union are disjoint. Using the addition rule and the difference rule, \begin{align} \#\left(\bigcup_{i=1}^{n+1} A_i\right) & = \#\left(\bigcup_{i=1}^n A_i\right) + \#(A_{n+1}) - \#\left[A_{n+1} \cap \left(\bigcup_{i=1}^n A_i\right)\right]\\ & = \#\left(\bigcup_{i=1}^n A_i\right) + \#(A_{n+1}) - \#\left[\bigcup_{i=1}^n (A_i \cap A_{n+1})\right] \end{align} By the induction hypothesis, the formula holds for the two unions of \( n \) sets in the last expression. The result then follows by simplification.

The general Bonferroni inequalities, named again for Carlo Bonferroni, state that if sum on the right is truncated after \(k\) terms (\(k \lt n\)), then the truncated sum is an upper bound for the cardinality of the union if \(k\) is odd (so that the last term has a positive sign) and is a lower bound for the cardinality of the union if \(k\) is even (so that the last terms has a negative sign).

The Multiplication Rule

The multiplication rule of combinatorics is based on the formulation of a procedure (or algorithm) that generates the objects to be counted. Specifically, suppose that the procedure consists of \(k\) steps, performed sequentially, and that for each \(j \in \{1, 2, \ldots, k\}\), step \(j\) can be performed in \(n_j\) ways regardless of the choices made on the previous steps. Then the number of ways to perform the entire algorithm (and hence the number of objects) is \(n_1 \, n_2 \, \cdots \, n_k\).

The key to a successful application of the multiplication rule to a counting problem is the clear formulation of an algorithm that generates the objects being counted, so that each object is generated once and only once. That is, we must neither over-count nor under-count. It's also important to notice that the set of choices available at step \( j \) may well depend on the previous steps; the assumption is only that the number of choices available does not depend on the previous steps.

The first two results below give equivalent formulations of the multiplication principle.

Suppose that \(S\) is a set of sequences of length \(k\), and that we denote a generic element of \(S\) by \((x_1, x_2, \ldots, x_k)\). Suppose that for each \(j \in \{1, 2, \ldots, k\}\), \(x_j\) has \(n_j\) different values, regardless of the values of the previous coordinates. Then \(\#(S) = n_1 n_2 \cdots n_k\).

Proof:

A procedure that generates the sequences in \( S \) consists of \( k \) steps. Step \( j \) is to select the \( j \)th coordinate.

Suppose that \(T\) is an ordered tree with depth \(k\) and that each vertex at level \(i - 1\) has \(n_i\) children for \(i \in \{1, 2, \ldots, k\}\). Then the number of endpoints of the tree is \(n_1 n_2 \cdots n_k\).

Proof:

Each endpoint of the tree is uniquely associated with the path from the root vertex to the endpoint. Each such path is a sequence of length \( k \), in which there are \( n_j \) values for coordinate \( j \) for each \( j \in \{1, 2, \ldots, k\} \). Hence the result follows from the previous result on sequences.

Product Sets

If \(S_i\) is a set with \(n_i\) elements for \(i \in \{1, 2, \ldots, k\}\) then \[ \#(S_1 \times S_2 \times \cdots \times S_k) = n_1 n_2 \cdots n_k \]

Proof:

This is a corollary of the result above on sequences.

If \(S\) is a set with \(n\) elements, then \(S^k\) has \(n^k\) elements.

Proof:

This is a corollary of the previous result on product sets.

In the previous result, note that the elements of \( S^k \) can be thought of as ordered samples of size \(k\) that can be chosen with replacement from a population of \(n\) objects. Elements of \(\{0, 1\}^n\) are sometimes called bit strings of length \(n\). Thus, there are \( 2^n \) bit strings of length \( n \).

Functions

The number of functions from a set \(A\) of \(m\) elements into a set \(B\) of \(n\) elements is \(n^m\).

Proof:

An algorithm for constructing a function \(f: A \to B\) is to choose the value of \(f(x) \in B\) for each \(x \in A\). There are \( n \) choices for each of the \( m \) elements in the domain.

Recall that the set of functions from a set \(A\) into a set \(B\) (regardless of whether the sets are finite or infinite) is denoted \(B^A\). The result in the previous exercise is motivation for this notation. Note also that if \( S \) is a set with \( n \) elements, then the elements in the Cartesian power set \( S^k \) can be thought of as functions from \( \{1, 2, \ldots, k\} \) into \( S \). Thus, the result above on Cartesian powers can be thought of as a corollary of the previous result on functions.

Subsets

If \(S\) is a set with \(n\) elements then there are \(2^n\) subsets of \(S\).

Proof from the multiplication principle:

An algorithm for constructing \(A \subseteq S\), is to decide whether \(x \in A\) or \(x \notin A\) for each \(x \in S\). There are 2 choices for each of the \( n \) elements of \( S \).

Proof using indicator functions:

Recall that there is a one-to-one correspondence between subsets of \( S \) and indicator functions on \( S \). An indicator function is simply a function from \( S \) into \( \{0, 1\} \), and the number of such functions is \( 2^n \) by the result above on functions

Suppose that \( \{A_1, A_2, \ldots A_k\} \) is a collection of \(k\) subsets of a set \(S\). There are \(2^{2^k}\) different (in general) sets that can be constructed from the \(k\) given sets, using the operations of union, intersection, and complement. These sets form the algebra generated by the given sets.

Proof:

First note that there are \(2^k\) pairwise disjoint sets of the form \(B_1 \cap B_2 \cap \cdots \cap B_k\) where \(B_i = A_i\) or \(B_i = A_i^c\) for each \(i\). Next, note that every set that can be constructed from \(\{A_1, A_2, \ldots, A_n\}\) is a union of some (perhaps all, perhaps none) of these intersection sets.

Open the Venn diagram app.

  1. Select each of the 4 disjoint sets \( A \cap B \), \( A \cap B^c \), \( A^c \cap B \), \( A^c \cap B^c \).
  2. Select each of the 12 other subsets of \( S \). Note how each is a union of some of the sets in (a).

Suppose that \(S\) is a set with \(n\) elements and that \(A\) is a subset of \(S\) with \(k\) elements. The number of subsets of \(S\) that contain \(A\) is \(2^{n - k}\).

Proof:

Note that subset \( B \) of \( S \) that contains \( A \) can be written uniquely in the form \( B = A \cup C \) where \( C \subseteq A^c \). \( A^c \) has \( n - k \) elements and hence there are \( 2^{n-k} \) subsets of \( A^c \).

Computational Exercises

Identification Numbers

A license number consists of two letters (uppercase) followed by five digits. How many different license numbers are there?

Answer:

\(26^2 \cdot 10^5 = 67 \, 600 \, 000\)

Suppose that a Personal Identification Number (PIN) is a four-symbol code word in which each entry is either a letter (uppercase) or a digit. How many PINs are there?

Answer:

\(36^4 = 1 \, 679 \, 616\)

Cards, Dice, and Coins

In the board game Clue, Mr. Boddy has been murdered. There are 6 suspects, 6 possible weapons, and 9 possible rooms for the murder.

  1. The game includes a card for each suspect, each weapon, and each room. How many cards are there?
  2. The outcome of the game is a sequence consisting of a suspect, a weapon, and a room (for example, Colonel Mustard with the knife in the billiard room). How many outcomes are there?
  3. Once the three cards that constitute the outcome have been randomly chosen, the remaining cards are dealt to the players. Suppose that you are dealt 5 cards. In trying to guess the outcome, what hand of cards would be best?
Answer:
  1. \(6 + 6 + 9 = 21\) cards
  2. \(6 \cdot 6 \cdot 9 = 324\) outcomes
  3. The best hand would be the \(5\) remaining weapons or the \(5\) remaining suspects.

An experiment consists of rolling a standard die, drawing a card from a standard deck, and tossing a standard coin. How many outcomes are there?

Answer:

\(6 \cdot 52 \cdot 2 = 624\)

A standard die is rolled 5 times and the sequence of scores recorded. How many outcomes are there?

Answer:

\(6^5 = 7776\)

In the card game Set, each card has 4 properties: number (one, two, or three), shape (diamond, oval, or squiggle), color (red, blue, or green), and shading (solid, open, or stripped). The deck has one card of each (number, shape, color, shading) configuration. A set in the game is defined as a set of three cards which, for each property, the cards are either all the same or all different.

  1. How many cards are in a deck?
  2. How many sets are there?
Answer:
  1. \(3^4 = 81\)
  2. \(1080\)

A coin is tossed 10 times and the sequence of scores recorded. How many sequences are there?

Answer:

\(2^{10} = 1024\)

The die-coin experiment consists of rolling a die and then tossing a coin the number of times shown on the die. The sequence of coin results is recorded.

  1. How many outcomes are there?
  2. How many outcomes are there with all heads?
  3. How many outcomes are there with exactly one head?
Answer:
  1. \(\sum_{k=1}^6 2^k = 126\)
  2. \(6\)
  3. \(\sum_{k=1}^6 k = 21\)

Run the die-coin experiment 100 times and observe the outcomes.

Consider a deck of cards as a set \(D\) with 52 elements.

  1. How many subsets of \(D\) are there?
  2. How many functions are there from \(D\) into the set \(\{1, 2, 3, 4\}\)?
Answer:
  1. \(2^{52} = 4 \, 503 \, 599 \, 627 \, 370 \, 496\)
  2. \(4^{52} = 20 \, 282 \, 409 \, 603 \, 651 \, 670 \, 423 \, 947 \, 251 \, 286 \, 016\)

Birthdays

Consider a group of 10 persons.

  1. If we record the birth month of each person, how many outcomes are there?
  2. If we record the birthday of each person (ignoring lead day), how many outcomes are there?
Answer:
  1. \( 12^{10} = 61 \, 917 \, 364 \, 224 \)
  2. \(365^{10} = 41 \, 969 \, 002 \, 243 \, 198 \, 805 \, 166 \, 015 \, 625\)

Reliability

In the usual model of structural reliability, a system consists of components, each of which is either working or defective. The system as a whole is also either working or defective, depending on the states of the components and how the components are connected.

A string of lights has 20 bulbs, each of which may be good or defective. How many configurations are there?

Answer:

\(2^{20} = 1 \, 048 \, 576\)

If the components are connected in series, then the system as a whole is working if and only if each component is working. If the components are connected parallel, then the system as a whole is working if and only if at least one component is working.

A system consists of three subsystems with 6, 5, and 4 components, respectively. Find the number of component states for which the system is working in each of the following cases:

  1. The components in each subsystem are in parallel and the subsystems are in series.
  2. The components in each subsystem are in series and the subsystems are in parallel.
Answer:
  1. \( (2^6 - 1)(2^5 - 1)(2^4 - 1) = 29 \, 295 \)
  2. 7

Menus

Suppose that a sandwich at a restaurant consists of bread, meat, cheese, and various toppings. There are 4 choices for the bread, 3 choices for the meat, 5 choices for the cheese, and 10 different toppings (each of which may be chosen). How many sandwiches are there?

Answer:

\(4 \cdot 3 \cdot 5 \cdot 2^{10} = 61 \, 440\)

At a wedding dinner, there are three choices for the entrée, four choices for the beverage, and two choices for the dessert.

  1. How many different meals are there?
  2. If there are 50 guests at the wedding and we record the meal requested for each guest, how many possible outcomes are there?
Answer:
  1. \( 3 \cdot 4 \cdot 2 = 24 \)
  2. \( 24^{50} \approx 1.02462 \times 10^{69} \)

Braille

Braille is a tactile writing system used by people who are visually impaired. The system is named for the French educator Louis Braille and uses raised dots in a \( 3 \times 2 \) grid to encode characters. How many meaningful Braille configurations are there?

Answer:

\( 2^6 - 1 = 63 \). Note that the grid with no raised dots is not meaningful.

The Braille encoding of the number 2 and the letter b
Braille encoding

Personality Typing

The Meyers-Briggs personality typing is based on four dichotomies: A person is typed as either extroversion (E) or introversion (I), either sensing (S) or intuition (I), either thinking (T) or feeling (F), and either judgement (J) or perception (P).

  1. How many Meyers-Briggs personality types are there? List them.
  2. Suppose that we list the personality types of 10 persons. How many possible outcomes are there?
Answer:
  1. 16
  2. \( 16^{10} = 1 \, 099 \, 511 \, 627 \, 776 \)

The Galton Board

The Galton Board, named after Francis Galton, is a triangular array of pegs. Galton, apparently too modest to name the device after himself, called it a quincunx from the Latin word for five twelfths (go figure). The rows are numbered, from the top down, by \((0, 1, \ldots )\). Row \(n\) has \(n + 1\) pegs that are labeled, from left to right by \((0, 1, \ldots, n)\). Thus, a peg can be uniquely identified by an ordered pair \((n, k)\) where \(n\) is the row number and \(k\) is the peg number in that row.

A ball is dropped onto the top peg \((0, 0)\) of the Galton board. In general, when the ball hits peg \((n, k)\), it either bounces to the left to peg \((n + 1, k)\) or to the right to peg \((n + 1, k + 1)\). The sequence of pegs that the ball hits is a path in the Galton board.

There is a one-to-one correspondence between each pair of the following three collections:

  1. Bit strings of length \(n\)
  2. Paths in the Galton board from \((0, 0)\) to any peg in row \(n\).
  3. Subsets of a set with \(n\) elements.

Thus, each of these collections has \(2^n\) elements.

Open the Galton board app.

  1. Move the ball from \((0, 0)\) to \((10, 6)\) along a path of your choice. Note the corresponding bit string and subset.
  2. Generate the bit string \(0111001010\). Note the corresponding subset and path.
  3. Generate the subset \(\{2, 4, 5, 9, 10\}\). Note the corresponding bit string and path.
  4. Generate all paths from \((0, 0)\) to \((4, 2)\). How many paths are there?
Answer:
  1. 6