Let \(I\) be a countable alphabet of letters, and let \(S\) denote the set of all finite length words using letters from \(I\). The empty word, with no letters, is denoted \(e\). Let \(\cdot\) denote the concatenation operation on elements of \(S\). That is, suppose that \(m, \, n \in \N\) and that \(x_i, \, y_j \in I\) for \(i \in \{1, 2, \ldots, m\}\) and \(j \in \{1, 2, \ldots, n\}\). Then \(x = x_1 \cdots x_m \in S\) and \(y = y_1 \cdots y_n \in S\), and \[x y = x_1 \cdots x_m y_1 \cdots y_n\] The space \((S, \cdot)\) is a discrete, positive semigroup, known as the free semigroup generated by \(I\).
The set of words \(S\) is sometimes denoted \(I^*\), and the elements of \(I\) are the irreducible elements of \((S, \cdot)\). The adjective free is used because there are no algebra rules
imposed on \((S, \cdot)\). Here are two examples that we will use in exercises:
The binary semigroup is the free semigroup on the alphabet \(\{0, 1\}\), and so the simplest non-trivial free semigroup. The nonempty words are simply bit strings of finite length.
The genetic semigroup is the free semigroup on the alphabet \(\{A, C, G, T\}\). In genetics, the letters refer to the nucleotides that are the base building blocks of DNA: adenine, cytosine, guanine, and thymine, respectively. Nonempty words may represent genes.
For \(x \in S\) and \(i \in I\), let \(d_i(x)\) denote the number of times that letter \(i\) occurs in \(x\), and let \(d(x) = \sum_{i \in I} d_i(x)\) denote the length of \(x\).
Note that \(d_i(x) = 0\) for all but finitely many \(i \in I\).
Suppose again that \((S, \cdot)\) is the free semigroup corresponding to the countable alphabet \(I\).
Since \((S, \upa)\) is a rooted tree, all of the results in Section 1 apply. In particular, if \(\#(I) = k \in \N_+\), then \((S, \upa)\) is a regular rooted tree of degree \(k\). Specializing further, when \(k = 2\) and \(I = \{0, \,1\}\), so that we have the binary semigroup described in , then the corresponding cover graph \((S, \upa)\) is the binary tree. Note also that \(d(x)\) is the distance from \(e\) to \(x\) in the cover graph \((S, \upa)\) for \(x \in S\), so our notation is consistent. More generally, if \(x \preceq y\) then \(d(x, y) = d(y) - d(x)\) is the distance from \(x\) to \(y\) in \((S, \upa)\). Like all rooted trees, \((S, \preceq)\) is a lower semi-lattice and for \(x, \, y \in S\), \(x \wedge y\) is the largest common prefix of \(x\) and \(y\). But \((S, \preceq)\) is not an upper semi-lattice unless \(I\) consists of a single letter.
Let \(\N_I = \left\{(n_i: i \in I): n_i \in \N \text{ for } i \in I \text{ and } n_i = 0 \text{ for all but finitely many } i \in I \right\} \). We define the multinomial coefficients on \(\N_I\) by \[C(\bs{n}) = \#\left\{x \in S: d_i(x) = n_i \text{ for } i \in I\right\} = \frac{\left(\sum_{i \in I} n_i \right)!}{\prod_{i \in I} n_i!}, \quad \bs n = (n_i: i \in I) \in \N_I\]
The free semigroup has the property that\([e, x]\) is finite and totally ordered for each \(x \in S\). Moreover, it is the only discrete positive semigroup with this property.
Suppose that \((S, \cdot)\) is a discrete positive semigroup with associated partial order \(\preceq\) and with the property that \([e, x]\) is finite and totally ordered for each \(x \in S\). Then \((S, \cdot)\) is isomorphic to a free semigroup on an alphabet.
Let \(I\) denote the set of irreducible elements of \((S, \cdot)\). If \(x \in S - \{e\}\) then \(i_1 \preceq x\) for some \(i_1 \in I\) and hence \(x = i_1y\) for some \(y \in S\). If \(y \succ e\) then we can repeat the argument to write \(y = i_2z\) for some \(i_2 \in I\) and \(z \in S\). Note that \(x = i_1 i_2 z\) and hence \(i_1 i_2 \preceq x\). Moreover, \(i_1\) and \(i_1 i_2\) are distinct. Since \([e, x]\) is finite, the process must terminate. Thus, we can write \(x\) in the form \(x = i_1 i_2 \cdots i_n\) for some \(n \in \N_+\) and some \(i_1, i_2, \ldots, i_n \in I\). Finally, we show that the factorization is unique. Suppose that \(x = i_1 i_2 \cdots i_n = j_1 j_2 \cdots j_m\) where \(i_1, \ldots, i_n \in I\) and \(j_1, \ldots, j_m \in I\). Then \(i_1 \preceq x\) and \(j_1 \preceq x\). Since the elements of \([e, x]\) are totally ordered, we must have \(i_1 = j_1\). Using left cancellation we have \(i_2 \cdots i_n = j_2 \cdots j_m\). Continuing in this fashion it follows that \(m = n\) and \(i_1 = j_1\), \(i_2 = j_2, \ldots, i_n = j_n\).
Suppose that \((S, \cdot)\) is a discrete positive semigroup with the property that every \(x \in S\) has a unique finite factoring over the set of irreducible elements \(I\). Then \((S, \cdot)\) is isomporphic to the free semigroup on \(I\).
Every \(x \in S\) has a finite factoring over \(I\), so that \(x = i_1 i_2 \cdots i_n\). If this factoring is unique, then clearly \((i_1, i_2, \ldots i_n) \mapsto i_1 i_2 \cdots i_n\) is an isomorphism from the free semigroup \((I^*, \cdot)\) onto \((S, \cdot)\).
The left walk functions and the left generating functions for the graphs \((S, \preceq)\), \((S, \upa)\) and \((S, \Upa)\) are given in Section 1, as is the Möbius kernel for \((S, \preceq)\). Since we have a discrete positive semigroup, we can give the more basic Möbius function:
The Möbius function \(\mu\) of \((S, \cdot)\) is given by \(\mu(e) = 1\), \(\mu(i) = -1\) for \(i \in I\), and \(\mu(x) = 0\) for all other \(x \in S\).
The semigroup dimension is the number of letters.
\(\dim(S, \cdot) = \#(I)\).
Note first that a homomorphism \(\phi\) from \((S, \cdot)\) into \((\R, +)\) can uniquely be specified by defining \(\phi(i)\) for all \(i \in I\) and then defining \[\phi(i_1 i_2 \cdots i_n) = \phi(i_1) + \phi(i_2) + \cdots \phi(i_n)\] Thus, if \(\phi\) is a such a homomorphism and \(\phi(i) = 0\) for all \(i \in I\), then \(\phi(x) = 0\) for all \(x \in S\). Now suppose that \(B \subseteq S\) and \(\#(B) \lt \#(I)\). We will show that there exists a nonzero homomorphism from \((S, \cdot)\) into \((\R, +)\) with \(\phi(x) = 0\) for all \(x \in B\). Let \(I_B\) denote the set of letters contained in the words in \(B\). Suppose first that \(I_B\) if a proper subset of \(I\) (and note that this must be the case if \(I\) is infinite). Define a homomorphism \(\phi\) by \(\phi(i) = 0\) for \(i \in I_B\) and \(\phi(i) = 1\) for \(i \in I - I_B\). Then \(\phi(x) = 0\) for \(x \in B\), but \(\phi\) is not the zero homomorphism. Suppose next that \(I_B = I\). Thus \(I\) is finite, so let \(k = \#(B)\) and \(n = \#(I)\), with \(k \lt n\). Denote the words in \(B\) by \[i_{j 1} i_{j 2} \cdots i_{j m_j}, \quad j = 1, 2, \ldots, k\] The set of linear, homogeneous equations \[\phi(i_{j 1}) + \phi(i_{j 2}) + \cdots + \phi(i_{j m_j}) = 0, \quad j = 1, 2, \ldots, k\] has \(n\) unkowns, namely \(\phi(i)\) for \(i \in I\), but only \(k\) equations. Hence there exists a non-trivial solution. The homomorphism so constructed satisfies \(\phi(x) = 0\) for \(x \in B\).
We can use the strict positive semigroup to provide a simple example of a discrete semigroup in which multiples of counting measure \(\#\) are not the only left invariant measures.
Suppose that \(I\) has at least two elements and consider the strict positive semigroup \((S_+, \cdot)\) associated with the free semigroup \((S, \cdot)\). That is, \(S_+ = S - \{e\}\) while \(\cdot\) is still concatenation. Define the function \(g: S_+ \to (0, \infty)\) by \(g(x) = c_i\) if the terminal letter of \(x\) is \(i \in I\), where \(c_i \in (0, \infty)\) for each \(i \in I\) and \(c_i \ne c_j\) for some \(i, \, j \in I\). Let \(\lambda\) be the positive measure on \(S_+\) with density \(g\) (relative to \(\#\)). That is, \[\lambda(A) = \sum_{x \in A} g(x) = \sum_{i \in I} c_i n_i(A), \quad A \subseteq S_+\] where \(n_i(A)\) is the number of words in \(A\) with terminal letter \(i\) for each \(i \in I\). Then \(\lambda\) is left invariant: If \(x \in S\) and \(A \subseteq S\) then the words in \(x A\) have the same terminal letters as the corresponding words in \(A\), so \(\lambda(x A) = \lambda(A)\). But \(\lambda\) is clearly not a multiple of counting measure.
Suppose that \((S, \cdot)\) is the free semigroup over a countable alphabet \(I\) as described above. The corresponding graph \((S, \preceq)\) is a partial order graph whose covering graph \((S, \upa)\) is a rooted tree. So all of the results in Section 2 apply to the graphs \((S, \preceq)\), \((S, \prec)\), \((S, \upa)\), and \((S, \Upa)\) (the reflexive closure of \((S, \upa)\)). We will review some of these results using the notation of the free semigroup.
Suppose that \(X\) is a random variable in \(S\) with density function \(f\). By definition, the reliability function \(F\) of \(X\) for \((S, \preceq)\) is given by \[F(x) = \P(X \succeq x) = \sum_{x \preceq y} f(y) = \sum_{t \in S} f(x t), \quad x \in S\]
A function \(F: S \to [0, \infty]\) is a reliability function for \((S, \preceq)\) if and only if
In this case, the density function \(f\) is given by \[f(x) = F(x) - \sum_{i \in I} F(x i), \quad x \in S\]
For \(\alpha \in (0, 1)\), the following recursive scheme defines a probability density function \(f\) on \(S\):
Suppose that \(X\) has density function \(f\) with parameter \(\alpha \in (0, 1)\) as constructed in . Then
Suppose again that \(X\) has density function \(f\) with parameter \(\alpha \in (0, 1)\) as constructed in . Then \(N = d(X)\), the number of letters in \(X\), has the geometric distribution on \(\N\) with parameter \(\alpha\).
Since we have a semigroup and not just a rooted tree, we are particularly interested in exponential distributions. Let \(P_I\) denote the collection of probability density functions \(\bs{p} = (p_i: i \in I)\) on \(I\) that have complete support, so that \(p_i \gt 0\) for \(i \in I\) and \(\sum_{i \in I} p_i = 1\). Recall some definitions from Section 2: \(d_i(x)\) is the number of times that letter \(i \in I\) occurs in the word \(x \in S\), and \(d(x) = \sum_{i \in I} d_i(x)\) is the length of \(x\). Also \[\N_I = \left\{\bs n = (n_i: i \in I): n_i \in \N \text{ for } i \in I \text{ and } n_i = 0 \text{ for all but finitely many } i \in I \right\} \] and \(C(\bs n)\) is the multinomial coefficient of \(\bs n \in \N_I\): \[C(\bs{n}) = \#\left\{x \in S: d_i(x) = n_i \text{ for } i \in I\right\} = \frac{\left(\sum_{i \in I} n_i \right)!}{\prod_{i \in I} n_i!}, \quad \bs n = (n_i: i \in I) \in \N_I\]
Random variable \(X\) has an exponential distribution on \((S, \cdot)\) if and only if \(X\) has density function \(f\) of the form \[f(x) = \alpha (1 -\alpha)^{d(x)} \prod_{i \in I} p_i^{d_i(x)}, \quad x \in S\] where \(\alpha \in (0, 1)\) is the rate constant, and where \(\bs p = (p_i: i \in I) \in P_I\).
We apply results from Section 2.5. Let \(F\) denote the reliability function of \(X\) and let \(F(i) = \beta_i \in (0, 1)\) for \(i \in I\). The memoryless property requires that \[F(x) = \prod_{i \in I} \beta_i^{d_i(x)}, \quad x \in S\] The constant rate property requires that \(\sum_{x \in S} F(x) \lt \infty\) (and the reciprocal of this sum is the rate parameter). But from the multinomial theorem \[\sum_{x \in S} F(x) = \sum_{n = 0}^\infty \sum\left\{ C(\bs n) \prod_{i \in I} \beta_i^{n_i}: \bs n \in \N_I, \, \sum_{i \in I} n_i = n \right\} = \sum_{n = 0}^\infty \beta^n\] where \(\beta = \sum_{i \in I} \beta_i\). Hence we must have \(\beta \in (0, 1)\) and the rate constant is \(1 - \beta\). So \(X\) has density function \(f\) given by \[f(x) = (1 - \beta) \prod_{i \in I} \beta_i^{d_i(x)}, \quad x \in S\] Finally, we re-define the parameters: let \(\alpha = 1 - \beta\) and let \(p_i = \beta_i / \beta\) for \(i \in I\). Then \[f(x) = \alpha \prod_{i \in I} [p_i (1 - \alpha)]^{d_i(x)} = \alpha (1 - \alpha)^{d(x)} \prod_{i \in I} p_i^{d_i(x)}, \quad x \in S\]
In particular, in the free semigroup, every memoryless distribution has constant rate and hence is exponential. The converse is not true. The exponential distribution with parameters \(\alpha \in (0, 1)\) and \((p_i: i \in I) \in P_I\) corresponds to choosing \[f(x i) = (1 - \alpha) p_i f(x), \quad x \in S, \, i \in I\] in . But of course the family of constant rate distributions in this proposition includes many others not of this type. Here is a simple example:
Consider the free binary semigroup, so that \(I = \{0, 1\}\) and the elements of \(S = I^*\) are bit strings. Let \(a \gt 0\), \(b \gt 0\), \(a \neq b\), and \(a + b \lt 1\). Define \(F: S \to (0,\,1]\) by the recursion scheme in as follows: \begin{align*} F(e) &= 1 \\ F(0) &= b, \, F(1) = a\\ F(x 0) &= a F(x), \, F(x 1) = b F(x), \quad x \ne e \end{align*} Then \(F\) is a reliability function relative to \((S, \preceq)\) for a distribution with constant rate \(1 - a - b\) but this distribution is not exponential for \((S, \cdot)\) or even a mixture of exponential distributions.
There is a simple interpretation of the exponential distribution in terms of a random product of independent, identically distributed letters.
Suppose that \(\bs U = (U_1, U_2, \ldots)\) is a sequence of independent copies of \(U\), where \(U\) is a random variable in \(I\) with density function \(\bs p \in P_I\). Let \(N\) be independent of \(\bs U\) and have a geometric distribution on \(\N\) with success parameter \(\alpha \in (0, 1)\). Let \(X\) be the random variable in \(S\) defined by \(X = U_1 U_2 \cdots U_N\). Then \(X\) has the exponential distribution with parameters \(\alpha\) and \(\bs p\).
Let \(x = i_1 i_2 \cdots i_n \in S\) where \(n \in \N\) and \((i_1, i_2, \ldots, i_n) \in I^n\) Then \begin{align*} \P(X = x) & = \P(X = x, N = n) = \P(X = x \mid N = n) \P(N = n) \\ &= \P(U_1 = i_i, U_2 = i_2 \ldots U_n = i_n) \alpha (1 - \alpha)^n = \P(U_1 = i_1) \P(U_2 = i_2) \cdots \P(U_n = i_n) \alpha (1 - \alpha)^n \\ &= \alpha (1 - \alpha)^{d(x)} \prod_{i \in I} p_i^{d_i(x)} \end{align*}
It's worth noting again that the number of letters \(N\) has the exponential distribution on \((\N, +)\) with rate \(\alpha\). The exponential distribution on \((S, \cdot)\) is clearly invariant under a permutation of the letters. If \(X = U_1 U_2 \cdots U_N\) has the exponential distribution in and if \((V_1, V_2, \ldots, V_N)\) is a permutation of \((U_1, U_2, \cdots, U_N)\) then \(Y = V_1 V_2 \ldots V_N\) has the same exponential distribution.
Suppose that \(X\) has the exponential distribution on \((S, \cdot)\). with parameters \(\alpha \in (0, 1)\) and \(\bs p = (p_i: i \in I) \in P_I\).
Suppose again that \(X\) has the exponential distribution on \((S, \cdot)\). with parameters \(\alpha \in (0, 1)\) and \(\bs p = (p_i: i \in I) \in P_I\). Random variable \(X\) maximizes entropy over all random variables \(Y \in S\) with \[\E[d_i(Y)] = \E[d_i(X)] = \frac{1 - \alpha}{\alpha} p_i, \quad i \in I\]
Suppose again that \(X\) has the exponential distribution on \((S, \cdot)\) with parameters \(\alpha \in (0, 1)\) and \(\bs p \in P_I\). Then \(X\) has a compound Poisson distribution.
Recall from that \(X\) can be decomposed as \[X = U_1 U_2 \cdots U_N\] where \(\bs U = (X_1, X_2, \ldots)\) are independent and identically distributed on the alphabet \(I\), with probability density function \((p_i: i \in I)\), and where \(N\) is independent of \(\bs U\) and has the geometric distribution on \(\N\) with success parameter \(\alpha\). (That is, the exponential distribution on \((\N, +)\) with rate \(\alpha\).) But from Section 4.1, \(N\) has a compound Poisson distribution and can be written in the form \[N = M_1 + M_2 + \cdots + M_K\] where \(\bs M = (M_1, M_2, \ldots)\) are independent and identically distributed on \(\N_+\) with the logarithmic distribution \[\P(M = n) = -\frac{(1 - \alpha)^n}{n \ln \alpha}, \quad n \in \N_+\] and where \(K\) is independent of \(\bs M\) and has the Poisson distribution with parameter \(-\ln \alpha\). It follows that we can take \(\bs X\), \(\bs M\), and \(K\) mutually independent, and thus \[X = V_1 V_2 \cdots V_K\] where \(V_i = U_{M_{i-1} + 1} \cdots U_{M_i}\) for \(i \in \N_+\) (and with \(M_0 = 1\)). Note that \(\bs V = (V_1, V_2, \ldots)\) are independent and identically distributed on \(S\) and thus \(X\) has a compound Poisson distribution.
So from it follows that \(X\) also has an infinitely divisible distribution on \((S, \cdot)\).
From , it's easy to construct the random walk on \((S, \cdot)\) corresponding to the exponential distribution with parameters \(\alpha \in (0, 1)\) and \(\bs p = (p_i: i \in I) \in P_I\). Let \(\bs U = (U_1, U_2, \ldots)\) be independent variables in \(I\) with common density \(\bs p\) so that \(\P(U_j = i) = p_i\) for \(i \in I\) and \(j \in \N_+\). Let \(\bs J = (J_1, J_2, \ldots)\) be independent variables each with the geometric distribution on \(\N\) with rate \(\alpha\), and with \(\bs J\) independent of \(\bs U\). Thus \(K_n = \sum_{i=1}^n J_i\) has the negative binomial distribution on \(\N\) with parameters \(\alpha\) and \(n\) for \(n \in \N_+\).
Let \(Y_n = U_1 \cdots U_{K_n}\) for \(n \in \N_+\). Then \(\bs Y = (Y_1, Y_2, \ldots)\) is the random walk on \((S, \cdot)\) associated with the exponential distribution. For \(n \in \N_+\), the probability density function \(f_n\) of \(X_n\) is given by \[f_n(x) = \binom{d(x) + n - 1}{n - 1} \alpha^n (1 - \alpha)^{d(x)} \prod_{i \in I} p_i^{d_i(x)}, \quad x \in S\]
Of course, this also follows from the general theory. From our standard most random
interpretation, observing \(Y_{n+1} = x\) for \(n \in \N_+\) and \(x \in S\) gives no information about the increasing sequence of random prefixes \((Y_1, Y_2, \ldots, Y_n)\) of \(x\).
Suppose that \(\bs Y = (Y_1, Y_2, \ldots)\) is the random walk on \((S, \cdot)\) associated with the exponential distribution with parameters \(\alpha \in (0, 1)\) and \(\bs p = (p_i: i \in I) \in P_I\). Let \(n \in \N_+\).
Our final discussion concerns quotient spaces as studied in Section 2.8. The most natural way to form a sub-semigroup of \((S, \cdot)\) is to start with a proper subset of letters \(J \subset I\) and then consider the set of finite words \(T = J^*\) with letters in \(J\). Clearly \((T, \cdot)\) is a complete sub-semigroup of \((S, \cdot)\), and both are free semigroups. The quotient space \(S / T\) consists of the empty word \(e\) and words \(x \in S\) with first letter in \(I \setminus J\). The assumptions in Section 2.8 are satisfied, so \(x \in S\) has the unique factoring \(x = y z\) with \(y \in T\) and \(z \in S / T\). Clearly \(y\) is the largest initial prefix of \(x\) with letters in \(J\) and \(z\) is the corresponding suffix. The following result follows easily from the general theory:
Suppose that \(\bs X\) has the exponential distribution on \((S, \cdot)\) with parameters \(\alpha \in (0, 1)\) and \(\bs p = (p_i: i \in I) \in P_I\), and let \(p_J = \sum_{j \in J} p_j\). Consider the factoring \(X = Y Z\) where \(Y\) takes values in \(T\) and \(Z\) takes values in \(S / T\). Then
The results follow from the general theory in Section 2.8, except for the form of the density functions. We use the construction in Proposition .
The app below simulates the exponential distribution on the binary semigroup (the free semigroup over the alphabet \(\{0, 1\}\) as described in Section 2). The parameter \(\alpha\) is the constant rate and the parameter \(p\) is the probability that each letter is 1. Random variable \(N\) is the length of the random word \(X\) and random variable \(M\) is the number of 1s in \(X\).
The app below simulates the exponential distribution on the genetic semigroup (the free semigroup over the alphabet \(\{A, C, G, T\}\) as described in Section 2). The rate parameter \(\alpha\) can be varied with the scrollbar, and the letter probabilities \(p_A\), \(p_C\), \(p_G\), and \(p_T\) can be changed with the input controls. Click the Set button after changing the probabilities. If the probabilities do not sum to 1 then they will be rescaled so that they do. Random variable \(N\) is the length of the random word \(X\).