$$\newcommand{\P}{\mathbb{P}}$$ $$\newcommand{\E}{\mathbb{E}}$$ $$\newcommand{\R}{\mathbb{R}}$$ $$\newcommand{\N}{\mathbb{N}}$$ $$\newcommand{\Z}{\mathbb{Z}}$$ $$\newcommand{\bs}{\boldsymbol}$$ $$\newcommand{\var}{\text{var}}$$

## 16. Transition Matrices and Generators of Continuous-Time Chains

### Preliminaries

This is the second of the three introductory sections on continuous-time Markov chains. Thus, suppose that $$\bs{X} = \{X_t: t \in [0, \infty)\}$$ is a continuous-time Markov chain defined on an underlying probability space $$(\Omega, \mathscr{F}, \P)$$ and with state space $$(S, \mathscr{S})$$. By the very meaning of Markov chain, the set of states $$S$$ is countable and the $$\sigma$$-algebra $$\mathscr{S}$$ is the collection of all subsets of $$S$$. So every subset of $$S$$ is measurable, as is every function from $$S$$ to another measurable space. Recall that $$\mathscr{S}$$ is also the Borel $$\sigma$$ algebra corresponding to the discrete topology on $$S$$. With this topology, every function from $$S$$ to another topological space is continuous. Counting measure $$\#$$ is the natural measure on $$(S, \mathscr{S})$$, so in the context of the general introduction, integrals over $$S$$ are simply sums. Also, kernels on $$S$$ can be thought of as matrices, with rows and sums indexed by $$S$$. The left and right kernel operations are generalizations of matrix multiplication.

A space of functions on $$S$$ plays an important role. Let $$\mathscr{B}$$ denote the collection of bounded functions $$f: S \to \R$$. With the usual pointwise definitions of addition and scalar multiplication, $$\mathscr{B}$$ is a vector space. The supremum norm on $$\mathscr{B}$$ is given by $\|f\| = \sup\{\left|f(x)\right|: x \in S\}, \quad f \in \mathscr{B}$ Of course, if $$S$$ is finite, $$\mathscr{B}$$ is the set of all real-valued functions on $$S$$, and $$\|f\| = \max\{\left|f(x)\right|: x \in S\}$$ for $$f \in \mathscr{B}$$.

In the last section, we studied $$\bs{X}$$ in terms of when and how the state changes. To review briefly, let $$\tau = \inf\{t \in (0, \infty): X_t \ne X_0\}$$. Assuming that $$\bs{X}$$ is right continuous, the Markov property of $$\bs{X}$$ implies the memoryless property of $$\tau$$, and hence the distribution of $$\tau$$ given $$X_0 = x$$ is exponential with parameter $$\lambda(x) \in [0, \infty)$$ for each $$x \in S$$. The assumption of right continuity rules out the pathological possibility that $$\lambda(x) = \infty$$, which would mean that $$x$$ is an instantaneous state so that $$\P(\tau = 0 \mid X_0 = x) = 1$$. On the other hand, if $$\lambda(x) \in (0, \infty)$$ then $$x$$ is a stable state, so that $$\tau$$ has a proper exponential distribution given $$X_0 = x$$ with $$\P(0 \lt \tau \lt \infty \mid X_0 = x) = 1$$. Finally, if $$\lambda(x) = 0$$ then $$x$$ is an absorbing state, so that $$\P(\tau = \infty \mid X_0 = x) = 1$$. Next we define a sequence of stopping times: First $$\tau_0 = 0$$ and $$\tau_1 = \tau$$. Recursively, if $$\tau_n \lt \infty$$ then $$\tau_n = \inf\left\{t \gt \tau_n: X_t \ne X_{\tau_n}\right\}$$, while if $$\tau_n = \infty$$ then $$\tau_{n+1} = \infty$$. With $$M = \sup\{n \in \N: \tau_n \lt \infty\}$$ we define $$Y_n = X_{\tau_n}$$ if $$n \in \N$$ with $$n \le M$$ and $$Y_n = Y_M$$ if $$n \in \N$$ with $$n \gt M$$. The sequence $$\bs{Y} = (Y_0, Y_1, \ldots)$$ is a discrete-time Markov chain on $$S$$ with one-step transition matrix $$Q$$ given by $$Q(x, y) = \P(X_\tau = y \mid X_0 = x)$$ if $$x, \, y \in S$$ with $$x$$ stable, and $$Q(x, x) = 1$$ if $$x \in S$$ is absorbing. Assuming that $$\bs{X}$$ is regular, which means that $$\tau_n \to \infty$$ as $$n \to \infty$$ with probability 1 (ruling out the explosion event of infinitely many transitions in finite time), the structure of $$\bs{X}$$ is completely determined by the sequence of stopping times $$\bs{\tau} = (\tau_0, \tau_1, \ldots)$$ and the discrete-time jump chain $$\bs{Y} = (Y_0, Y_1, \ldots)$$. Analytically, the distribution $$\bs{X}$$ is determined by the exponential parameter function $$\lambda$$ and the one-step transition matrix $$Q$$ of the jump chain.

In this section, we sill study the Markov chain $$\bs{X}$$ in terms of the transition matrices in continuous time and a fundamentally important matrix known as the generator. Naturally, the connections between the two points of view are particularly interesting.

### The Transition Semigroup

#### Definition and basic Properties

The first part of our discussion is very similar to the treatment for a general Markov processes, except for simplifications caused by the discrete state space. We assume that $$\bs{X} = \{X_t: t \in [0, \infty)\}$$ is a Markov chain on $$S$$.

The transition probability matrix $$P_t$$ of $$\bs{X}$$ corresponding to $$t \in [0, \infty)$$ is $P_t(x, y) = \P(X_t = y \mid X_0 = x), \quad (x, y) \in S^2$ In particular, $$P_0 = I$$, the identity matrix on $$S$$

Proof:

The mapping $$y \mapsto P_t(x, y)$$ is the PDF of $$X_t$$ given $$X_0 = x$$. Hence $$P_t$$ is a probability matrix. That is, $$P_t(x, y) \ge 0$$ for $$(x, y) \in S^2$$ and $$\sum_{y \in S} P_t(x, y) = 1$$ for $$x \in S$$. Trivially, $$P_0 = I$$ by definition.

Note that since we are assuming that the Markov chain is homogeneous, $P_t(x, y) = \P(X_{s + t} = y \mid X_s = x), \quad (x, y) \in S^2$ for every $$s, \, t \in [0, \infty)$$. The Chapman-Kolmogorov equation given next is essentially yet another restatement of the Markov property. The equation is named for Andrei Kolmogorov and Sydney Chapman,

Suppose that $$\bs{P} = \{P_t: t \in [0, \infty)\}$$ is the collection of transition matrices for the chain $$\bs{X}$$. Then $$P_s P_t = P_{s+t}$$ for $$s, \, t \in [0, \infty)$$. Explicitly, $P_{s+t}(x, z) = \sum_{y \in S} P_s(x, y) P_t(y, z), \quad x, \, z \in S$

Proof:

We condition on $$X_s$$. $P_{s+t}(x, z) = \P(X_{s + t} = z \mid X_0 = x) = \sum_{y \in S} \P(X_{s+t} = z \mid X_s = y, X_0 = x) \P(X_s = y \mid X_0 = x)$ But by the Markov and time homogeneous properties, $\P(X_{s+t} = z \mid X_s = y, X_0 = x) = \P(X_{s+t} = z \mid X_s = y) = P_t(y, z)$ Of course by definition, $$\P(X_s = y \mid X_0 = x) = P_s(x, y)$$. So the first displayed equation above becomes $P_{s+t}(x, y) = \sum_{y \in S} P_s(x, y) P_t(y, z) = P_s P_t(x, z)$

Restated in another form of jargon, the collection $$\bs{P} = \{P_t: t \in [0, \infty)\}$$ is a semigroup of probability matrices. The semigroup of transition matrices $$\bs{P}$$, along with the initial distribution, determine the finite-dimensional distributions of $$\bs{X}$$.

Suppose that $$X_0$$ has probability density function $$f$$. If $$(t_1, t_2, \ldots, t_n) \in [0, \infty)^n$$ is a time sequence with $$0 \lt t_1 \lt \cdots \lt t_n$$ and $$(x_0, x_1, \ldots, x_n) \in S^{n+1}$$ is a state sequence, then $\P\left(X_0 = x_0, X_{t_1} = x_1, \ldots X_{t_n} = x_n\right) = f(x_0) P_{t_1}(x_0, x_1) P_{t_2 - t_1}(x_1, x_2) \cdots P_{t_n - t_{n-1}}(x_{n-1}, x_n)$

Proof:

To simplify the notation, we will just give the cases $$n = 1$$ and $$n = 2$$, which capture the essence of the proof. First suppose $$x, \, y \in S$$ and $$t \in [0, \infty)$$. Then $\P(X_0 = x, X_t = y) = \P(X_0 = x) \P(X_t = y \mid X_0 = x) = f(x) P_t(x, y)$ Next suppose that $$x, \, y, \, z \in S$$ and $$s, \, t \in [0, \infty)$$ with $$s \lt t$$. Then $\P(X_0 = x, X_s = y, X_t = z) = \P(X_t = z \mid X_0 = x, X_s = y) \P(X_0 = x, X_s = y)$ But by the Markov and time homogeneous properties, $$\P(X_t = z \mid X_0 = x, X_s = y) = P_{t - s}(y, z)$$. By the $$n = 1$$ case, $$\P(X_0 = x, X_s = y) = f(x) P_s(x, y)$$. Hence $\P(X_0 = x, X_s = y, X_t = z) = f(x) P_s(x, y) P_{t-s}(y, z)$

As with any matrix on $$S$$, the transition matrices define left and right operations on functions which are generalizations of matrix multiplication. For a transition matrix, both have natural interpretations.

Suppose that $$f: S \to \R$$, and that either $$f$$ is nonnegative or $$f \in \mathscr{B}$$. Then for $$t \in [0, \infty)$$, $P_t f(x) = \sum_{y \in S} P_t(x, y) f(y) = \E[f(X_t) \mid X_0 = x], \quad x \in S$ The mapping $$f \mapsto P_t f$$ is a bounded, linear operator on $$\mathscr{B}$$ and $$\|P_t\| = 1$$.

Proof:

Since $$P_t(x, \cdot)$$ is the conditional probability density function of $$X_t$$ given $$X_0 = x$$, it follows that $$P_t f(x) = \E[f(X_t) \mid X_0 = x]$$. The statement about $$f \mapsto P_t f$$ follows from general results on probability kernels.

If $$f$$ is nonnegative and $$S$$ is infinte, then it's possible that $$P_t f(x) = \infty$$. In general, the left operation of a positive kernel acts on positive measures on the state space. In the setting here, if $$\mu$$ is a positive (Borel) measure on $$(S, \mathscr{S})$$, then the function $$f: S \to [0, \infty)$$ given by $$f(x) = \mu\{x\}$$ for $$x \in S$$ is the density function of $$\mu$$ with respect to counting measure $$\#$$ on $$(S, \mathscr{S})$$. This simply means that $$\mu(A) = \sum_{x \in A} f(x)$$ for $$A \subseteq S$$. Conversely, given $$f: S \to [0, \infty)$$, the set function $$\mu(A) = \sum_{x \in A} f(x)$$ for $$A \subseteq S$$ defines a positive measure on $$(S, \mathscr{S})$$ with $$f$$ as its density function. So for the left operation of $$P_t$$, it's natural to consider only nonnegative functions.

If $$f: S \to [0, \infty)$$ then $f P_t(y) = \sum_{x \in S} f(x) P_t(x, y), \quad y \in S$ If $$X_0$$ has probability density function $$f$$ then $$X_t$$ has probability density function $$f P_t$$.

Proof:

If $$X_0$$ has PDF $$f$$, then conditioning gives $\P(X_t = y) = \sum_{x \in S} \P(X_t = y \mid X_0 = x) \P(X_0 = x) = \sum_{x \in S} P_t(x, y) f(x) = f P_t(x), \quad y \in S$

More generally, if $$f$$ is the density function of a positive measure $$\mu$$ on $$(S, \mathscr{S})$$ then $$f P_t$$ is the density function of the measure $$\mu P_t$$, defined by $\mu P_t(A) = \sum_{x \in S} \mu\{x\} P_t(x, A) = \sum_{x \in S} f(x) P_t(x, A), \quad A \subseteq S$

A function $$f : S \to [0, \infty)$$ is invariant for the Markov chain $$\bs{X}$$ (or for the transition semigroup $$\bs{P}$$) if $$f P_t = f$$ for every $$t \in [0, \infty)$$.

It follows that if $$X_0$$ has an invariant probability density function $$f$$, then $$X_t$$ has probability density function $$f$$ for every $$t \in [0, \infty)$$, so $$\bs{X}$$ is identically distributed. Invariant and limiting distributions are fundamentally important for continuous-time Markov chains.

#### Standard Semigroups

Suppose again that $$\bs{X} = \{X_t: t \in [0, \infty)\}$$ is a Markov chain on $$S$$ with transition semigroup $$\bs{P} = \{P_t: t \in [0, \infty)\}$$. Once again, continuity assumptions need to be imposed on $$\bs{X}$$ in order to rule out strange behavior that would otherwise greatly complicate the theory. In terms of the transition semigroup $$\bs{P}$$, here is the basic assumption:

The transition semigroup $$\bs{P}$$ is standard if $$P_t(x, x) \to 1$$ as $$t \downarrow 0$$ for each $$x \in S$$.

Since $$P_0(x, x) = 1$$ for $$x \in S$$, the standard assumption is clearly a continuity assumption. It actually implies much stronger smoothness properties that we will build up by stages.

If the transition semigroup $$\bs{P} = \{P_t: t \in [0, \infty)\}$$ is standard, then the function $$t \mapsto P_t(x, y)$$ is right continuous for each $$(x, y) \in S^2$$.

Proof:

First note that if $$(x, y) \in S^2$$ with $$x \ne y$$ then $$P_h(x, y) \le 1 - P_h(x, x) \to 0$$ as $$h \downarrow 0$$. Hence $$P_h(x, y) \to I(x, y)$$ as $$h \downarrow 0$$ for all $$(x, y) \in S^2$$. Suppose next that $$t \in (0, \infty)$$ and $$(x, y) \in S^2$$. By the semigroup property, $P_{t+h}(x, y) = P_t P_h(x, y) = \sum_{z \in S} P_t(x, z) P_h(z, y)$ But $$P_h(z, y) \to I(z, y)$$ as $$h \downarrow 0$$ so by the bounded convergence theorem, $$P_{t+h}(x, y) \to P_t(x, y)$$ as $$h \downarrow 0$$.

Our next result connects one of the basic assumptions in the section on transition times and the embedded chain with the standard assumption here.

If the Markov chain $$\bs{X}$$ has no instantaneous states then the transition semigroup $$\bs{P}$$ is standard.

Proof:

Given $$X_0 = x \in S$$ note that $$\tau \gt t$$ implies $$X_t = x$$. Hence $P_t(x, x) = \P(X_t = x \mid X_0 = x) \ge \P(\tau \gt t \mid X_0 = x) = e^{-\lambda(x) t}$ Since $$\bs{X}$$ has no instantaneous states, $$0 \le \lambda(x) \lt \infty$$ so $$e^{-\lambda(x) t} \to 1$$ as $$t \downarrow 0$$.

Recall that the non-existence of instantaneous states is essentially equivalent to the right continuity of $$\bs{X}$$. So we have the nice result that if $$\bs{X}$$ is right continuous, then so is $$\bs{P}$$. For the remainder of our discussion, we assume that $$\bs{X} = \{X_t: t \in [0, \infty)\}$$ is a regular Markov chain on $$S$$ with transition semigroup $$\bs{P} = \{P_t: t \in [0, \infty)\}$$, exponential function $$\lambda$$ and one-step transition matrix $$Q$$ for the jump chain. Our next result is the fundamental integral equations relating $$\bs{P}$$, $$\lambda$$, and $$Q$$.

For $$t \in [0, \infty)$$, $P_t(x, y) = I(x, y) e^{-\lambda(x) t} + \int_0^t \lambda(x) e^{-\lambda(x) s} Q P_{t - s} (x, y) \, ds, \quad (x, y) \in S^2$

Proof:

If $$x$$ is an absorbing state, then the equation trivially holds, since $$\lambda(x) = 0$$ and $$P_t(x, y) = I(x, y)$$. So suppose that $$x$$ is a stable state, and as above, let $$\tau = \inf\{t \in [0, \infty): X_t \ne X_0\}$$. Given $$X_0 = x$$, $$\tau$$ has a proper exponential distribution with parameter $$\lambda(x) \in (0, \infty)$$. Taking cases, $P_t(x, y) = \P(X_t = y \mid X_0 = x) = \P(X_t = y, \tau \gt t \mid X_0 = x) + \P(X_t = y, \tau \le t \mid X_0 = x)$ The first term on the right is 0 if $$y \ne x$$ and is $$\P(\tau \gt t \mid X_0 = x) = e^{-\lambda(x) t}$$ if $$y = x$$. In short, $\P(X_t = y, \tau \gt t \mid X_0 = x) = I(x, y) e^{-\lambda(x)s}$ For the second term on the right in the displayed equation, we condition on $$\tau$$ and $$Y_1 = X_\tau$$. By a result in the last section on transition times and the embedded chain, the joint PDF of $$(\tau, Y_1)$$ at $$s \in [0, \infty)$$ and $$z \in S$$, given $$X_0 = x$$, is $$\lambda(x) e^{-\lambda(x) s} Q(x, z)$$ (continuous in time, discrete in space). Also, given $$\tau = s \in [0, t]$$ and $$Y_1 = z \in S$$, we can use the strong Markov property to restart the clock at $$s$$ giving $\P(X_t = y \mid X_0 = x, \tau = s, Y_1 = z) = \P(X_{t-s} = y \mid X_0 = z) = P_{t-s}(z, y)$ Putting the pieces together we have $\P(X_t = y, \tau \le t \mid X_0 = x) = \int_0^t \lambda(x) e^{-\lambda(x) s} \sum_{z \in S} Q(x, z) P_{t-s}(z, y) \, ds = \int_0^t \lambda(x) e^{-\lambda(x) s} QP_{t - s} (x, y) \, ds$

We can now improve on the continuity result that we got earlier. First recall the leads to relation for the jump chain $$\bs{Y}$$: For $$(x, y) \in S^2$$, $$x$$ leads to $$y$$ if $$Q^n(x, y) \gt 0$$ for some $$n \in \N$$. So by definition, $$x$$ leads to $$x$$ for each $$x \in S$$, and for $$(x, y) \in S^2$$ with $$x \ne y$$, $$x$$ leads to $$y$$ if and only if the discrete-time chain starting in $$x$$ eventually reaches $$y$$ with positive probability.

For $$(x, y) \in S^2$$,

1. $$t \mapsto P_t(x, y)$$ is continuous.
2. If $$x$$ leads to $$y$$ then $$P_t(x, y) \gt 0$$ for every $$t \in (0, \infty)$$.
3. If $$x$$ does not lead to $$y$$ then $$P_t(x, y) = 0$$ for every $$t \in (0, \infty)$$.

For $$t \in [0, \infty)$$, we can use the change of variables $$r = t - s$$ in the fundamental integral equation to get $P_t(x, y) = I(x, y) e^{-\lambda(x) t} + \lambda(x) e^{-\lambda(x) t} \int_0^t e^{\lambda(x) r} Q P_r (x, y) \, dr, \quad (x, y) \in S^2$

Proof:
1. In the displayed equation, $$r \mapsto P_r(x, y)$$ is right continuous for every $$(x, y) \in S^2$$, and hence by the bounded convergence theorem again, so is $$r \mapsto QP_r(x, y)$$. Since the integrand in the displayed equation is bounded and right continuous, the integral is a continuous function of $$t$$. Hence $$t \mapsto P_t(x, y)$$ is continuous for $$(x, y) \in S^2$$.
2. For $$x \in S$$, note that $$P_t(x, x) \ge e^{-\lambda(x) t} \gt 0$$ for $$t \in [0, \infty)$$. If $$x$$ leads to $$y$$ and $$x \ne y$$ then there exists $$n \in \N_+$$ and $$(x_1, x_2, \ldots, x_{n-1}) \in S^{n-1}$$ such that $$Q(x, x_1) \gt 0, \, \ldots Q(x_{n-1}, y) \gt 0$$. Then $P_t(x, y) = \P(X_t = y \mid X_0 = x) \ge \P(Y_1 = x_1, \ldots, Y_{n-1} = x_{n-1}, Y_n = y, \tau_n \le t \lt \tau_{n+1}) \gt 0$
3. This is clear from the definition of the embedded chain $$\bs{Y}$$.

Parts (b) and (c) are known as the Lévy dichotomy, named for Paul Lévy. It's possible to prove the Lévy dichotomy just from the semigroup property of $$\bs{P}$$, but this proof is considerably more complicated. In light of the dichotomy, the leads to relation clearly makes sense for the continuous-time chain $$\bs{X}$$ as well as the discrete-time embedded chain $$\bs{Y}$$.

### The Generator Matrix

#### Definition and Basic Properties

In this discussion, we assume again that $$\bs{X} = \{X_t: t \in [0, \infty)\}$$ is a regular Markov chain on $$S$$ with transition semigroup $$\bs{P} = \{P_t: t \in [0, \infty)\}$$, exponential parameter function $$\bs{\lambda}$$ and one-step transition matrix $$Q$$ for the embedded jump chain. The fundamental integral equation above now implies that the transition probability matrix $$P_t$$ is differentiable in $$t$$. The derivative at $$0$$ is particularly important.

The matrix function $$t \mapsto P_t$$ has a (right) derivative at 0: $\frac{P_t - I}{t} \to G \text { as } t \downarrow 0$ where the infinitesimal generator matrix $$G$$ is given by $$G(x, y) = -\lambda(x) I(x, y) + \lambda(x) Q(x, y)$$ for $$(x, y) \in S^2$$.

Proof:

As before the change of variables $$r = t - s$$ in the fundamental integral equation gives $P_t(x, y) = I(x, y) e^{-\lambda(x) t} + \lambda(x) e^{-\lambda(x) t} \int_0^t e^{\lambda(x) r} Q P_r (x, y) \, dr$ The first term is clearly differentiable in $$t$$, and the second term is also differentiable in $$t$$ since we now know that the integrand is a continuous function of $$r$$. The result then follows from standard calculus.

Note that $$\lambda(x) Q(x, x) = 0$$ for every $$x \in S$$, since $$\lambda(x) = 0$$ is $$x$$ is absorbing, while $$Q(x, x) = 0$$ if $$x$$ is stable. So $$G(x, x) = -\lambda(x)$$ for $$x \in S$$, and $$G(x, y) = \lambda(x) Q(x, y)$$ for $$(x, y) \in S^2$$ with $$y \ne x$$. Thus, the generator matrix $$G$$ determines the exponential parameter function $$\lambda$$ and the jump transition matrix $$Q$$, and thus determines the distribution of the Markov chain $$\bs{X}$$.

Given the generator matrix $$G$$ of $$\bs{X}$$,

1. $$\lambda(x) = -G(x, x)$$ for $$x \in S$$
2. $$Q(x, y) = - G(x, y) \big/ G(x, x)$$ if $$x \in S$$ is stable and $$y \in S - \{x\}$$

The infinitesimal generator has a nice interpretation in terms of our discussion in the last section. Recall that when the chain first enters a stable state $$x$$, we set independent, exponentially distributed timers on (x, y), for each $$y \in S - \{x\}$$. Note that $$G(x, y)$$ is the exponential parameter for the timer on $$(x, y)$$. As soon as an alarm sounds for a particular $$(x, y)$$, the chain moves to state $$y$$ and the process continues.

The generator matrix $$G$$ satisfies the following properties for every $$x \in S$$:

1. $$G(x, x) \le 0$$
2. $$\sum_{y \in S} G(x, y) = 0$$

The matrix function $$t \mapsto P_t$$ is differentiable on $$[0, \infty)$$, and satisfies the Kolmogorov backward equation: $$P^\prime_t = G P_t$$. Explicitly, $P^\prime_t(x, y) = -\lambda(x) P_t(x, y) + \sum_{z \in S} \lambda(x) Q(x, z) P_t(z, y), \quad (x, y) \in S^2$

Proof:

The proof is just like before, and follows from standard calculus and the integral equation $P_t(x, y) = I(x, y) e^{-\lambda(x) t} + \lambda(x) e^{-\lambda(x) t} \int_0^t e^{\lambda(x) r} Q P_r (x, y) \, dr$

The backward equation is named for Andrei Kolmogorov. In continuous time, the transition semigroup $$\bs{P} = \{P_t: t \in [0, \infty)\}$$ can be obtained from the single, generator matrix $$G$$ in a way that is reminiscent of the fact that in discrete time, the transition semigroup $$\bs{P} = \{P^n: n \in \N\}$$ can be obtained from the single, one-step matrix $$P$$. From a modeling point of view, we often start with the generator matrix $$G$$ and then solve the the backward equation, subject to the initial condition $$P_0 = I$$, to obtain the semigroup of transition matrices $$\bs{P}$$.

As with any matrix on $$S$$, the generator matrix $$G$$ defines left and right operations on functions that are analogous to ordinary matrix multiplication. The right operation is defined for functions in $$\mathscr{B}$$.

If $$f \in \mathscr{B}$$ then $$Gf$$ is given by $G f(x) = -\lambda(x) f(x) + \sum_{y \in S} \lambda(x) Q(x, y) f(y), \quad x \in S$

Proof:

By definition, $G f(x) = \sum_{y \in S} G(x, y) f(y) = -\lambda(x) f(x) + \sum_{y \in S - \{x\}} \lambda(x) Q(x, y) f(y)$ In the second term, we can sum over all $$y \in S$$ since $$\lambda(x) = 0$$ if $$x$$ is absorbing and $$Q(x, x) = 0$$ if $$x$$ is stable. Note that $$G f$$ is well defined since $\sum_{y \in S-\{x\}} \lambda(x) Q(x, y) \left|f(x)\right| \le \sum_{y \in S-\{x\}} \lambda(x) Q(x, y) \|f\| = \lambda(x) \|f\|$

But note that $$G f$$ is not in $$\mathscr{B}$$ unless $$\lambda \in \mathscr{B}$$. Without this additional assumption, $$G$$ is a linear operator from the vector space $$\mathscr{B}$$ of bounded functions from $$S$$ to $$\R$$ into the vector space of all functions from $$S$$ to $$\R$$. We will return to this point in our next discussion.

#### Uniform Transition Semigroups

We can obtain stronger results for the generator matrix if we impose stronger continuity assumptions on $$\bs{P}$$.

The transition semigroup $$\bs{P} = \{P_t: t \in [0, \infty)\}$$ is uniform if $$P_t(x, x) \to 1$$ as $$t \downarrow 0$$ uniformly in $$x \in S$$.

If $$\bs{P}$$ is uniform, then the operator function $$t \mapsto P_t$$ is continuous on the vector space $$\mathscr{B}$$.

Proof:

The statement means that for $$f \in \mathscr{B}$$, the function $$t \mapsto P_t f$$ is continuous with respect to the supremum norm on $$\mathscr{B}$$.

As usual, we want to look at this new assumption from different points of view.

The following are equivalent:

1. The transition semigroup $$\bs{P}$$ is uniform.
2. The exponential parameter function $$\lambda$$ is bounded.
3. The generator matrix $$G$$ defiens a bounded linear operator on $$\mathscr{B}$$.
Proof:

From our remarks above we know that $$\lambda \in \mathscr{B}$$ if and only if the generator matrix $$G$$ defines a bounded linear operator on $$\mathscr{B}$$. So we just need to show the equivalence of (a) and (b). If $$\lambda \in \mathscr{B}$$ then $P_t(x, x) = \P(X_t = x \mid X_0 = x) \ge \P(\tau \gt t \mid X_0 = x) = \exp[-\lambda(x) t] \ge \exp(-\|\lambda\|t)$ The last term converges to 1 as $$t \downarrow 0$$ uniformly in $$x$$.

So when the equivalent conditions are satisfied, the Markov chain $$\bs X = \{X_t: t \in [0, \infty)\}$$ is also said to be uniform. As we will see in a later section, a uniform, continuous-time Markov chain can be constructed from a discrete-time Markov chain and an independent Poisson process. For a uniform transition semigroup, we have a companion to the backward equation.

Suppose that $$\bs{P}$$ is a uniform transition semigroup. Then $$t \mapsto P_t$$ satisfies the Kolmogorov forward equation $$P^\prime_t = P_t G$$. Explicitly, $P^\prime_t(x,y) = -\lambda(y) P_t(x, y) + \sum_{z \in S} P_t(x, z) \lambda(z) Q(z, y), \quad (x, y) \in S^2$

The backward equation holds with more generality than the forward equation, since we only need the transition semigroup $$\bs{P}$$ to be standard rather than uniform. It would seem that we need stronger conditions on $$\lambda$$ for the forward equation to hold, for otherwise it's not even obvious that $$\sum_{z \in S} P_t(x, z) \lambda(z) Q(z, y)$$ is finite for $$(x, y) \in S$$. On the other hand, the forward equation is sometimes easier to solve than the backward equation, and the assumption that $$\lambda$$ is bounded is met in many applications (and of course holds automatically if $$S$$ is finite).

As a simple corollary, the transition matrices and the generator matrix commute for a uniform semigroup: $$P_t G = G P_t$$ for $$t \in [0, \infty)$$. The forward and backward equations formally look like the differential equations for the exponential function. This actually holds with the operator exponential.

Suppose again that $$\bs{P} = \{P_t: t \in [0, \infty)\}$$ is a uniform transition semigroup with generator $$G$$. Then $P_t = e^{t G} = \sum_{n=0}^\infty \frac{t^n}{n!} G^n, \quad t \in [0, \infty)$

Proof:

First $$e^{t G}$$ is well defined as a bounded linear operator on $$\mathscr{B}$$ for $$t \in [0, \infty)$$ (and hence also simply as a matrix), since $$G$$ is a bounded linear operator on $$\mathscr{B}$$. Trivially $$e^{0 G} = I$$, and by basic properties of the matrix exponential, $\frac{d}{dt} e^{t G} = G e^{t G}, \quad t \in (0, \infty)$ It follows that $$P_t = e^{t G}$$ for $$t \in [0, \infty)$$.

We can characterize the generators of uniform transition semigroups. We just need the minimal conditions that the diagonal entries are nonpositive and the row sums are 0.

Suppose that $$G$$ a matrix on $$S$$ with $$\|G\| \lt \infty$$. Then $$G$$ is the generator of a uniform transition semigroup $$\bs{P} = \{P_t: t \in [0, \infty)\}$$ if and only if for every $$x \in S$$,

1. $$G(x, x) \le 0$$
2. $$\sum_{y \in S} G(x, y) = 0$$
Proof:

We know of course that if $$G$$ is the generator of a transition semigroup, then conditions (a) and (b) hold. For the converse, we can use the previous result. Let $P_t = e^{t G} = \sum_{n=0}^\infty \frac{t^n}{n!} G^n, \quad t \in [0, \infty)$ which makes sense since $$G$$ is bounded in norm. Then $$P_t(x, y) \ge 0$$ for $$(x, y) \in S^2$$. By part (b), $$\sum_{y \in S} G^n(x, y) = 0$$ for every $$x \in S$$ and $$n \in \N_+$$, and hence $$\sum_{y \in S} P_t(x, y) = \sum_{y \in S} I(x, y) = 1$$ for $$x \in S$$. Finally, the semigroup property is a consequence of the law of exponents, which holds for the exponential of a matrix. $P_s P_t = e^{s G} e^{t G} = e^{(s+t) G} = P_{s+t}$

### Examples and Exercises

#### The Two-State Chain

Let $$\bs{X} = \{X_t: t \in [0, \infty)\}$$ be the Markov chain on the set of states $$S = \{0, 1\}$$, with transition rate $$a \in [0, \infty)$$ from 0 to 1 and transition rate $$b \in [0, \infty)$$ from 1 to 0. This two-state Markov chain was studied in the previous section. To avoid the trivial case with both states absorbing, we will assume that $$a + b \gt 0$$.

The generator matrix is $G = \left[\begin{matrix} -a & a \\ b & -b\end{matrix}\right]$

Show that for $$t \in [0, \infty)$$, $P_t = \frac{1}{a + b} \left[\begin{matrix} b & a \\ b & a \end{matrix} \right] - \frac{1}{a + b} e^{-(a + b)t} \left[\begin{matrix} -a & a \\ b & -b\end{matrix}\right]$

1. By solving the Kolmogorov backward equation.
2. By solving the Kolmogorov forward equation.
3. By computing $$P_t = e^{t G}$$.

You probably noticed that the forward equation is easier to solve because there is less coupling of terms than in the backward equation.

Define the probability density function $$f$$ on $$S$$ by $$f(0) = \frac{b}{a + b}$$, $$f(1) = \frac{a}{a + b}$$. Show that

1. $$P_t \to \frac{1}{a + b} \left[\begin{matrix} b & a \\ b & a \end{matrix} \right]$$ as $$t \to \infty$$, the matrix with $$f$$ in both rows.
2. $$f P_t = f$$ for all $$t \in [0, \infty)$$, so that $$f$$ is invariant for $$\bs{P}$$.
3. $$f G = 0$$.

#### Computational Exercises

Consider the Markov chain $$\bs{X} = \{X_t: t \in [0, \infty)\}$$ on $$S = \{0, 1, 2\}$$ with exponential parameter function $$\lambda = (4, 1, 3)$$ and embedded transition matrix $Q = \left[\begin{matrix} 0 & \frac{1}{2} & \frac{1}{2} \\ 1 & 0 & 0 \\ \frac{1}{3} & \frac{2}{3} & 0\end{matrix}\right]$

1. Draw the state graph and classify the states.
2. Find the generator matrix $$G$$.
3. Find the transition matrix $$P_t$$ for $$t \in [0, \infty)$$.
4. Find $$\lim_{t \to \infty} P_t$$.
1. The edge set is $$E = \{(0, 1), (0, 2), (1, 0), (2, 0), (2, 1)\}$$. All states are stable.
2. The generator matrix is $G = \left[\begin{matrix} -4 & 2 & 2 \\ 1 & -1 & 0 \\ 1 & 2 & -3 \end{matrix}\right]$
3. For $$t \in [0, \infty)$$, $P_t = \frac{1}{15} \left[\begin{matrix} 3 + 12 e^{-5 t} & 10 - 10 e^{-3 t} & 2 - 12 e^{-5 t} + 10 e^{-3 t} \\ 3 - 3 e^{-5 t} & 10 + 5 e^{-3 t} & 2 + 3 e^{-5t} - 5 e^{-3 t} \\ 3 - 3 e^{-5 t} & 10 - 10 e^{-3 t} & 2 + 3 e^{-5 t} + 10 e^{-3 t} \end{matrix}\right]$
4. $P_t \to \frac{1}{15} \left[\begin{matrix} 3 & 10 & 2 \\ 3 & 10 & 2 \\ 3 & 10 & 2 \end{matrix}\right]$

#### Special Models

Read the discussion of generator and transition matrices for chains subordinate to the Poisson process.

Read the discussion of the infinitesimal generator for continuous-time birth-death chains.

Read the discussion of the infinitesimal generator for continuous-time queuing chains.

Read the discussion of the infinitesimal generator for continuous-time branching chains.