An Accelerated Two-level Multigrid Method for Markov Chains

Based on the quadratic extrapolation method and its generalization, this paper presents an accelerated two-level multigrid method for speeding up the numerical computation of the stationary probability vector of an irreducible Markov chain. It shows how to combine these vector extrapolation methods with the two-level multigrid method on the coarse level in detail. Numerical results on two Markov chain problems are provided to illustrate the effectiveness of our proposed method in terms of reducing the iteration counts and computing time. An Accelerated Two-level Multigrid Method for Markov Chains Chun Wen* School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, China


Introduction
The use of Markov chains is of interest in a wide range of applications. For example, the web ranking and information retrieval [1][2][3], queuing systems [4][5][6][7], stochastic automata networks [8,9], manufacturing systems and inventory control [10] and communication systems [11,12] and so on. In order to analyze their performance measures, it is required to find their stationary probability distributions π by solving the linear system For a finite irreducible and aperiodic Markov chain, there exists a unique stationary probability distribution π whose elements are strictly greater than zero; see, e.g., [13,14]. Hence, for simplicity, we rewrite (1) as the following homogeneous linear system where A and x are the transposes of the generator matrix Q and the stationary probability distribution π, respectively. Here the coefficient matrix A has zero column sum, positive diagonal entries and nonpositive off diagonal entries.
Recently, there are large amounts of works have devoted to solving the linear system (2). For instance, the matrix splitting iterative methods [13,[15][16][17] Krylov, subspace methods [18][19][20][21] and some preconditioning techniques [6,9,17] and so on. What is more, based on the aggregation of Markov states, multigrid methods have been studied in the literature [22][23][24][25][26][27]. However, with the size of the Markov chains becomes large, the cost of multigrid methods is likely to have an increase. Therefore, it is natural to consider certain strategies to improve their applications.
In this paper, our concern is the two-level multigrid method. Starting from its basic framework [28,29], an accelerated two-level multigrid method is pro-posed for speeding up the numerical computation of the stationary probability vector of an irreducible Markov chain, by applying the quadratic extrapola-tion method discussed by Kamvar, Haveliwala, Manning and Golub [2] and its generalization presented by Sidi [30] to be the accelerators. It shows how to efficiently combine the two-level multigrid method with these vector extrapo-lation methods on the coarse level in detail. The new method is denoted as the two-level-extrapolation (TLE) method. Note that, the idea of improving some iterative methods by combining with vector extrapolation methods is not new [2,[30][31][32]. As a matter of fact, the main algorithmic contribution of the TLE method is that the computation of the coarse- is improved. Numerical experiments on two Markov chain problems are used to illustrate the efficiency and stability of the proposed method.
The rest of this paper is organized as follows. In Section 2, we briefly review and analyze the two-level multigrid method. In Section 3, the accelerated two-level multigrid method for Markov chains is proposed. In Section 4, numerical experiments are provided. Finally, conclusions are made in Section 5.

Two-evel Multigrid Method for Markov Chains
In this section, the two-level multigrid method is briefly introduced to solve the stationary probability distribution of Markov chains.
For computing numerical solutions of the linear system Ax b = with b the right-hand vector, certain multigrid methods have been presented in [24,28,29]. It is easy to find that the linear system (2) is in fact a special case of Ax b = when b = 0. Without loss of generality, let P be the full rank prolongation matrix of size , c n n × and R be the restriction operator of size , × c n n where n c is the size of the coarselevel matrix A c . Here operators P and R are created by an automatic coarsening process described below. Then, starting from the description of multigrid methods for the linear system Ax b = [24,28,29], the twolevel multigrid method for Markov chains is proposed in Algorithm 1, where the matrix A is known in the linear system (2), G is an aggregation matrix generated by Algorithm 2, and iter denotes the number of cycles until the one-norm residual 1

 
Ax reaches the prescribed tolerance ϵ. Note that the approximate solution x only is normalized at the end of Algorithm 1 rather than in each iteration, since doing like this not only is advantageous for efficient computer implementation, but also is able to save the computing cost.
At steps 2 and 7 of Algorithm 1, the weighted Jacobi method with the weight ω , a variant of Jacobi method, is employed to the pre-and post-smoothing processes. Let the coefficient matrix A of (2) be split into i L and U are the negated strictly lower-and upper-triangular parts of A, respectively. Then the weighted Jacobi relaxation method can be written as with weight ( ) 0,1 .

ω ∈
At step 3 of Algorithm 1, it is of vital importance how the aggregation matrix G is built based on x and A, that is, we need to understand which nodes should be aggregated into a block and which nodes should be split between their neighbors. Here, in Algorithm 2, we adopt a strength-based aggregation procedure that proposed by De Sterck et al. as our aggregation method, since it is able to improve an algebraically smooth error that varies slowly in a local neighborhood by scaling the original problem matrix A [24,25]. Otherwise go to step 2.

Obtain the aggregation matrix
Note that the computation of the strength matrix S is based on the problem matrix scaled by the current iterate, i.e., diag ( ) x a rather than the original coefficient matrix A (for details see [24]), where diag(·) denotes a diagonal matrix formed with the current iterate x k . Taking the similar way of defining the strength matrix S [24], then it follows that where θ is a strength of connection parameter. In Algorithm 2, the letter J denotes the number of aggregates, and the aggregation matrix G has the following form .
From (4) (5) is equivalent to to the power method applied to H GS [13]. With the initial approximation ¯c x obtained at step 4 of Algorithm 1, Gauss-Seidel method given in (5) modifies this approximation such that it becomes closer and closer to the true solution at each iteration. However, the procedure has a major disadvantage, that is, it often requires a very long time to converge to the desired solution. In order to overcome this problem, it is natural to consider improving the coarse-level computation by some strategies.

Accelerated Two-level Multigrid Method
In this section, we first give a short introduction to the generalization of the quadratic extrapolation method proposed by Sidi [30], and then show how to combine this vector extrapolation method with the two-level multigrid method on the coarse level for accelerating the numerical computation of the stationary probability distribution for Markov chains. As far as we know, various kinds of vector extrapolation methods have been discussed in SIAM Review [33]. For example, the polynomial-type vector extrapolation methods which include the minimal polynomial extrapolation (MPE) of Cabay and Jackson [34], and the epsilon vector extrapolation methods which utilize the scalar and vector epsilon methods of Wynn [35,36], and the topological epsilon method of Brezinski [37].
It should be noted that the starting point of the vector extrapolation algo-rithms is to accelerate the convergence of the sequences { } j x generated from such a fixed-point iterative method of the form 1 ( ), 0,1, ···; : R R , where x 0 is an initial vector. In recent years, applications of the vector extrap-olation methods to compute the stationary probability distribution of Markov chains have been reported in [2,[30][31][32].
Numerical simulations have also illus-trated that the polynomialtype methods are in general more economic than the epsilon vector extrapolation methods with respect to the computing time and storage requirements. Therefore, in this paper, we apply the polynomial-type vector extrapolation methods, i.e., the quadratic extrapolation method and its generalization, to be our accelerators.
In fact, using vector extrapolation methods as the accelerators is com-mon. For instance, Kamvar et al. have considered the quadratic extrapola-tion method to speed up the computation of the dominant eigenvector of the PageRank problem [2]. Based on Ritz values, Wu and Wei discussed its close connection with the Arnoldis method [32]. Moreover, Sidi reported that it was closely related to the MPE of Cabay and Jackson [34] and thus proposed a generalization of the quadratic extrapolation (GQE) method along with the implementation of MPE [30]. According to [30], the algorithm of the GQE is provided in Algorithm 3.
It is clear that the case k = 2 corresponds to the quadratic extrapolation method proposed in [2]. One feature of Algorithm 3 is that there exists a QR-factorization at step 2 for is unitary, and is an upper triangular matrix with positive diagonal elements. Precisely, they have   [30,38]. The MGS algorithm is given as follows.
Motivated by the study of [2,24,30,32]. Since using iterative methods like Gauss-Seidel method given in (5) to calculate the coarse- may require a very long time to converge to the desired solution. Our main contribution is to apply the vector extrapolation method (Algorithm 3) to modify the two-level multigrid method (Algorithm 1) on the coarse level, such that the convergence of calculating the stationary probability distribution of Markov chains becomes faster. The proposed method is denoted as the two-levelextrapolation (TLE) method and given in Algorithm 4.

3.
Obtain the approximate solution x of (2) by the steps 6-8 of Algorithm 1 and check convergence.
Comparing Algorithm 1 with Algorithm 4, the main difference between our proposed method and the standard two-level multigrid method is that, in the process of computing the coarse-level equation  [30]. Hence, in each cycle of Algorithm 4, the dominant cost spent in solving the coarse-level equation k z Generally speaking, since the size of k is often taken to be smaller than that of m in numerical experiments, and thus it follows which indicates that the total cost of the TLE method should be less than that of the TL method. For illustrating this point, numerical experiments are given in the next section.

Numerical Experiments
In this section, we report numerical results obtained by using Matlab 7.0.1 implementation on a Windows XP with 2.93GHz 64-bit processor and 2GB memory. The main goal is to examine the accelerated two-level multigrid method and show its efficiency in improving the numerical solution of the stationary probability distribution for Markov chains. In Algorithm 1, let the number of using Gauss-Seidel method to solve the coarse-level equation 0 = c c A x be m = 10 and m = 20, then for simplicity, we denote the standard two-level multigrid method as TL (10) and TL (20), respectively. In Algorithm 3, we set the letter k as 2,3, 4,5,6,7, = k and then the corresponding methods in Algorithm 4 are denoted as TLE(2), TLE(3), TLE(4), TLE(5), TLE (6) and TLE (7), respectively. Here two Markov chain problems studied in [9,25] are considered in our experiments. Now, some special sets of parameters are supplied in this paragraph from [24,25]. As mentioned in the previous sections, the weighted Jacobi method has been used as the pre-and post-smoothing approaches in Algorithm 1. in our experiments since this value works well for all tests that we have considered, even though it is likely to be problem-dependent. The strength of connection parameter is chosen as 0.8 θ = and the initial guess is generated by randomly sampling the uniform (0, 1) distribution and normalized to one in the one norm. All the iterations are terminated when 6 1 10 − ≤ =   Ax  with x the current approximate solution, or when the computing time (referred to as CPU) exceeds 600 seconds. Finally, numerical results in terms of iteration counts (referred as to IT) and CPU are reported by means of tables, while convergence histories are shown in figures with the number of iterations on the horizontal axis versus Relres (defined as log 10 of the updated relative 1-norms, i.e., 10 1 log   Ax ) on the vertical axis.

Example one: Uniform 2D lattice
This test problem is a 2D lattice with uniform weights [25]. It is similar to an isotropic elliptic PDE problem. Here we let the 2D lattice be square and use h to denote the number of nodes in every row or column, e.g., 20, 40,60, = h then the size of rows of the coefficient matrix A in the linear system (2) is 2 . = n h Table 1 has provided the IT and CPU of the TL and TLE methods for Example one. By making comparisons, we observe that, the IT and CPU of our accelerated two-level multigrid method are less than those of the stan-dard two-level multigrid method. Particularly, the TLE(7) has given the best results. Taking n=400 as an example, the iteration counts of the TL(10) and TL (20) are reduced about 91% and 87% by comparing with that of the TLE(7) respectively ( Figure 1).
For obtaining an intuitive comparison, Figure 1 has plotted the convergence histories of the TL(10), TL (20), TLE(3), TLE(5) and TLE(7) methods for Example one with n=400. It is not difficult to find that the accelerated two-level multigrid method has faster convergence.

Example two: Two-queue over flow networks
This test problem is the two-queue overflow networks with the customer arrival rate and service rate of the servers being λ i and ( ) respectively. Suppose the number of serves is i s and the waiting space is ( ) Then the size of rows of the matrix A in the linear system (2) is given by 1 2 . = n l l Here we let (l 1 , l 2 ) = (16,16), (32,32) and (64,32) in the test. The queueing discipline is First-come-first-served. Specifically, we allow the overflow of customers to occur from queue 2 to queue 1 when queue 2 is full and there is still waiting space in queue 1. The description of the two-queue overflow networks and the form of its generator matrix have been presented in a few papers; e.g., [9]. For simplicity, in this test, we set Numerical results of the TL and TLE methods for this test problem have been given in Table 2. Again, we see that the IT and CPU of our accelerated two-level multigrid method are less than those of the standard two-level multigrid method. Moreover, the higher order   vector extrapolation methods are used, the less iteration counts and computing time are needed. In particular, the TLE(7) has supplied the best results. For instance, when n=256, the TLE (7) only needs about 8% 11% − of the iteration steps and computing time of the TL method. Therefore, we can say that our proposed methods can efficiently speed up the convergence of the standard two-level multigrid method.
In order to further compare their numerical behavior from an intuitive point, Figure 2 has described the convergence histories of the TL(10), TL (20),TLE(3), TLE(5) and TLE(7) methods for Example two with n=256. These curves illustrate that the accelerated two-level multigrid method outperforms the standard two-level multigrid method once again ( Figure 2).

Conclusions
In this paper, an accelerated two-level multigrid method by the use of the quadratic extrapolation method and its generalization has been proposed for improving the numerical calculation of the stationary probability distribution of an irreducible Markov chain. The main algorithm has been given in Algorithm 4. It has shown how to combine Algorithm 1 with Algorithm 3 on the coarse level in detail. Numerical results in Tables 1 and 2 have indicated that the TLE method is superior to the TL method in terms of decreasing the IT and CPU. On the other hand, Figures 1 and 2 have illustrated the fast convergence of the accelerated two-level multigrid method.