Reach Us
+44-1522-440391

^{1}Mingqi Wu, Merck Research Laboratories, North Wales, PA

^{2}Faming Liang, Department of Statistics, Texas A&M University, College Station, Texas

- *Corresponding Author:
- Faming Liang

Department of Statistics

Texas A&M University, College Station, Texas

**E-mail:**[email protected]

**Received date:** June 15, 2011; **Accepted date:** November 01, 2011; **Published date:** November 04, 2011

**Citation:** Wu M, Liang F (2011) Population SAMC vs SAMC: Convergence and Applications to Gene Selection Problems. J Biomet Biostat S1:002. doi:10.4172/2155-6180.S1-002

**Copyright:** © 2011 Wu M, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Visit for more related articles at** Journal of Biometrics & Biostatistics

The Bayesian model selection approach has been adopted by more and more people when analyzing a large data. However, it is known that the reversible jump MCMC (RJMCMC) algorithm, which is perhaps the most popular MCMC algorithm for Bayesian model selection, is prone to get trapped into local modes when the model space is complex. The stochastic approximation Monte Carlo (SAMC) algorithm essentially overcomes the local trap problem suffered by conventional MCMC algorithms by introducing a self-adjusting mechanism based on the past samples. In this paper, we propose a population SAMC (Pop-SAMC) algorithm, which works on a population of SAMC chains and can make use of crossover operators from genetic algorithms to further improve its efficiency. Under mild conditions, we show the convergence of this algorithm. Comparing to the single chain SAMC algorithm, Pop-SAMC provides a more efficient self-adjusting mechanism and thus can converge faster. The effectiveness of Pop-SAMC for Bayesian model selection problems is examined through a change-point identification problem and a gene selection problem. The numerical results indicate that Pop-SAMC significantly outperforms both the single chain SAMC and RJMCMC.

Given data, how to select an optimal model, according to some criteria, from a set of possible models is an important topic for statistician. Under the Bayesian framework, the success of choosing the right model relies on how accurately one can estimate the posterior probability of each of the possible models. Although the reversible jump MCMC (RJMCMC) algorithm [1] can work well for many Bayesian model selection problems for which the model space is simple, it is prone to get trapped into local optimal models when the model space is complex, i.e., consisting of a multitude of models separated by high energy barriers.

To overcome the local-trap problem, [2] proposed a stochastic approximation Monte Carlo (SAMC) algorithm. The basic idea of SAMC can be described as follows. Suppose that we are interested in sampling from a distribution,

f(x) = cψ (x), x∈ X, (1)

where X is the sample space and c is an unknown constant. Let E_{1},...,E_{k} denote a partition of X, and let w_{i} = ∫_{Ei}ψ (x)dx for i=1,…,k. SAMC seeks to draw samples from the trial distribution

(2)

Where π_{i}’s are pre-specified constants such that π_{i} >0 for all i and which define the desired sampling frequency for each of the subregions. If w_{1},…,w_{k} are known, sampling from f_{w}(x) will result in a “random walk” in the space of subregions (by regarding each subregion as a point) with each subregion being sampled with a frequency proportional to π_{i}. Hence the local-trap problem can be overcome essentially, provided that the sample space is partitioned appropriately. The way to partition sample space is problem dependent. For example, if one’s goal is to maximize the target distribution, then one can partition the sample space according to the density function; if one’s goal is at model selection, then one may partition the sample space according to the index of models.

The success of SAMC depends on whether w_{i}’s can be well estimated. SAMC provides a systematic way to estimate w_{i} in an online manner. Let θ_{ti} denote the working estimate of log(w_{i}/π_{i}) obtained at iteration t, and let θ_{t}= (θ_{t1},…, θ_{tk}) ∈ Θ, where Θ denotes a compact set. Let {γ_{t}} be a positive, nondecreasing sequence satisfying

(3)

for any ζ>1. For example, one may set

(4)

for some value T_{0}>1. Under the above setting, one iteration of SAMC consists of the following steps:

The SAMC algorithm:

Metropolis-Hastings (MH) sampling Simulate a sample χ_{t} by a single MH update with the invariant distribution

(5)

Weight updating Set

(6)

where and is the indicator function. If θ* ∈ Θ, set θ_{t+1}=θ*; otherwise, set θ_{t+1}= θ*+C*, where C*=(c*,L,c*) can be an arbitrary vector which satisfies the condition θ*+C*∈ Θ. Note that f_{θ}(x) is invariant to this location transformation of θ*.

A remarkable feature of SAMC is its self-adjusting mechanism, which operates based on past samples. This mechanism penalizes the over-visited subregions and rewards the under-visited subregions, and thus enables the system to escape from local traps very quickly. Mathematically, if a subregion j is visited at time t,θ_{t+1,j} will be updated to a larger value,θ_{t+1,i} ← θ_{t+1} + γ_{t+1}(1-π_{i}), such that, this subregion has a smaller probability to be visited in the next iteration. On the other hand, for those regions, j(j≠i), not visited this iteration, θ_{t+1,j} will decrease to a smaller value, such that, the chance to visit these regions will increase in the next iteration.

Although SAMC has been quite effective in sample space exploration, the convergence of θ_{t} is usually slow. Because, SAMC is run in a single chain. At each iteration, there is one and only one component of et is non-zero, and the information gained for θ_{t} is minimal and thus the adjustment process is slow. As a result, a large variation of θ_{t} can be observed even after long iterations, especially when the number of subregions is large.

Inspired by the successes of population-based MCMC algorithms, e.g., adaptive direction sampling [3], conjugate gradient Monte Carlo [4], parallel tempering [5,6], and evolutionary Monte Carlo [7,8], we propose a population SAMC (Pop-SAMC) algorithm to accelerate the convergence of SAMC. The new algorithm works on a population of SAMC chains. The benefits are two-fold. Firstly, it provides a more efficient self-adjusting mechanism. Intuitively, when we have a population of SAMC chains running in parallel, the information gained for qt at each iteration is increased, which leads to a more accurate adjustment of qt. Consequently, this improves the convergence of qt. Secondly, running a population of chains in parallel enables the incorporation of crossover operators from the genetic algorithm [9] into simulations. With this operator, the distributed information across a population can be shared among chains/population, and this improve the efficiency of the new algorithm.

In this paper, we illustrate the use of Pop-SAMC for Bayesian model selection problems using a change-point identification example and a gene selection example. The numerical results show that the new method performs significantly better than both the single chain SAMC and RJMCMC, in estimating probabilities of competing models. A rigorous proof for the convergence of Pop-SAMC is provided in Appendix.

The remainder of this paper is organized as follows. In Section 2, we describe the Pop-SAMC algorithm and study its convergence theory. In Section 3, we illustrate Pop-SAMC using a multimodal example. In Section 4, we show the superiority of Pop-SAMC for Bayesian model selection problems by studying a change-point identification example and a gene selection example, along with comparisons with SAMC and RJMCMC. In Section 5, we conclude this paper with a brief discussion

**Population SAMC**

Consider the distribution defined in (1). Suppose the sample space χ has been partitioned into disjoint subregions, denoted by E_{1},…,E_{k}, and the same gain factor sequence {γ_{t}} as defined in (3) and (4) for the single chain SAMC, will be used for Pop-SAMC.

Pop-SAMC works on a population of SAMC chains in parallel. At each iteration, a set of samples, called a population, are generated. Let x^{t} = (x^{t}_{1} ,...,x^{t} _{N } ) represent the population generated at iteration t, and N is the population size. One iteration of Pop-SAMC algorithm consists of the following two steps:

**The Pop-SAMC algorithm**

**MH sampling: **For each chain, simulate a sample x^{t}_{i}, for i=1,…,N, by a single MH update with the invariant distribution as defined in (5). A new population of samples x^{t} will be obtained

Weight updating: Set

where If θ*∈ Θ, set θ_{t+1}= θ*;otherwise, set θ_{t+1}= θ*+c*, where C*=(c*,…,c*) can be an arbitrary vector which satisfies the condition θ*+ C* ∈ Θ.

Pop-SAMC is a generalized version of SAMC, with multiple independent samples (conditional on the current population) being generated at each iteration. This enables a frequency estimator for .Compared with the indicator vector et used in the single chain SAMC, provides a more accurate estimator of Pt. This is the key reason why Pop-SAMC can outperform SAMC in terms of convergence of qt .

**Convergence**

Regarding the convergence of the algorithm, we note that for the empty subregions, the corresponding components of qt will trivially converge to -∞ as t →∞. Therefore, without loss of generality, we show in Appendix only the convergence of the algorithm for the case that all subregions are non-empty. Extending the proof to the general case is trivial, since replacing (7) by (8) (given below) will not change the process of pop-SAMC simulation.

(8)

where V = (V,...,V)´ is an m -vector of V,and V= and k0 is the number of empty subregions.

In our proof, we assume that Θ is a compact set; that is, there exists a constant vector c_{t} for each t such that C_{t}+θ_{t} ∈ Θ. This assumption is made only for the reason of mathematical simplicity. Extension of our results to the case that Θ=!m is trivial with the technique of varying truncations studied in [10,11]. Interested readers can refer to [12] for the details, where the convergence of SAMC is studied with Θ=!m. In the simulations of this paper, we set , as a practical matter, this is equivalent to setting Θ=!m.

Under the above assumptions, we have the following theorem concerning the convergence of the Pop-SAMC algorithm, whose proof can be found in Appendix.

Theorem Let E_{1},…,E_{m} be a partition of a compact sample space χ and ψ(x) be a non-negative function defined on χ with for all E_{i}’s.Let π=(π,…,πm) be an m -vector with 0<π_{i}<1 and Let {γ_{t}} be a non-increasing, positive sequence satisfying (3). If Θ is compact and the drift condition [condition (A2) given in Appendix] is satisfied, then, as t→∞, we have almost surely,

(9)

where C is an unknown constant, and k_{0} is the number of empty subregions. The constant C can be determined by imposing a constraint, e.g., is equal to a known number.

The drift condition assumption is classical, which implies the existence of the stationary distribution f_{θ}(x) for any θ ∈ Θ. To have the drift condition satisfied, we assume that χ is compact and f(x) is bounded away from 0 and ∞ on χ. This assumption is true for many Bayesian model selection problems, for example, the Bayesian changepoint identification and the Bayesian regression variable selection problems considered in this paper. For both problems, after integrating out the model parameters, the sample space is reduced to a finite set of models. For continuum systems, one may restrict χ to the region , where ψ_{min} is sufficiently small such that the region is not of interest. Otherwise, one may put conditions on the tail of f(x) as prescribed by [11]. For the proposal distribution used in the MH sampling, we assume it to satisfy the local positive condition: There exists ε_{1} > 0 and ε_{ 2}> 0 such that . This condition is quite standard and has been widely used in the study of MCMC convergence, [13].

**Crossover**

Another attractive feature of Pop-SAMC is that, with parallel independent running chains, information can be exchanged or shared between different chains to further improve the algorithm’s efficiency. Borrowing information from different chains can be done through crossover operators originated in the genetic algorithm [9]. One pioneer work in this direction is the evolutionary Monte Carlo algorithm (EMC) [7,8]. Motivated by successes of the EMC, we incorporate crossover into Pop-SAMC and below we only discuss the simplest case for illustration purpose.

Suppose we have a population X = (X1 ,…,XN) at iteration t, where is a d-dimensional vector, the so-called an individual or chromosome. Let p_{c} denote the crossover rate, and is the number of the chromosome in the current population that will be crossovered, where [z]_{e} denotes the maximum even number smaller than z. For each iteration, the crossover operator works as follows:

**Selection: **Random select Nc chromosomes from the current population X, and randomly allocate them to form Nc/2 pairs.

**Crossover: **For each of the pairs , an integer crossover point C is first decided by drawing uniformly from {1,…d}, then two new chromosomes (y_{i},y_{j}) are obtained by swapping the components of the two parental chromosomes to the right of the crossover point. As shown below.

Following MH rules, the new chromosomes are accepted into the new population y with probability min(1,r_{c}), and

where is the select probability of (x_{i},x_{j}) from the population x and is the generating probability of (y_{i},y_{j}) from the parental chromosomes (x_{i},x_{j}). Since our parental chromosomes are chosen randomly from the population and by the symmetric properties of the crossover operator, it is easy to show that Thus, equation (10) will be reduced to the likelihood ratio between the new chromosomes and the old ones:

(11)

In the Pop-SAMC, the crossover operation can be included in the MH Sampling step, N_{c} chromosomes are updated using the crossover operator, and (N-N_{c}) chromosomes are updated with a single MH step respectively according to the invariant distribution as defined in (5).

The rationale behind the effectiveness of crossover operators can be explained as follows. At each iteration, some samples obtained in one chain may be better than others in terms of likelihood values. If this chain happens to be selected for the crossover operation, by exchanging parts of its chromosome with other individuals, the overall quality of the whole population could be potentially improved.

To illustrate the performance of Pop-SAMC in estimation of the sample space partitioning weight, we study a multimodal example [8], whose density function is given by

(12)

where each component has an equal variance σ^{2} =0.02 and is assigned an equal weight α_{1}= ... =α_{20} = 0.05. See [8] for the values of the mean vectors.

It is shown that some components are far from others, more than 30 times of the standard deviation in distance. This puts a great challenge on the testing algorithm.

Let , and let it be partitioned according to, (In terms of physics, U(X) is called the energy function of the distribution), with an equal energy bandwidth Δu=0.5 into the following subregions: and the desired sampling distribution to be uniform Both Pop- SAMC and SAMC provide a self-adjusting mechanism for learning the partition weights , for i =1,…,50, in an online manner. With the uniform desired sampling distribution, the partition weight reduce to the probability that a sample is drawn from each subregion i, i.e. The true value of P(E_{i}) is calculated with a total of 20 × 10^{8} samples drawn equally from each of the twenty components of f(x). In order to have a fair comparison, we run both SAMC and Pop- SAMC with the same number of energy evaluations, using the same proposal distribution and the same gain factor sequences.

Pop-SAMC was run 100 times independently with the setting: T_{0}=50,N=10,K=50, and the number of iterations =10^{5}. SAMC was applied to this example 100 times independently with the same setting except for the following parameters: T_{0} =100 and the number of iterations=10^{6}. The computational results along with the true value of P(E_{i}) ‘s for i=2,…,9, are summarized in (**Table 1**), the results for other subregions with zero or tiny probability are not listed. The results show that Pop-SAMC has made a significant improvement in accuracy over SAMC in terms of standard deviations of the estimates of P(E_{i})’s. On average, the standard error of the Pop-SAMC estimates is only about 1/10 of that of the SAMC estimates. These results are achieved under the same number of energy evaluations for each of the algorithm, which made the comparison fair. In the table, we also reported the CPU times cost by a single run of each method to further clarify the fairness of our comparison. Pop-SAMC even cost less CPU time than SAMC in this comparison. In summary, Pop-SAMC can converge much faster than SAMC for estimation of the weight of subregions.

Estimates | True Prob. (%) | Pop-SAMC | SAMC |
---|---|---|---|

P(E_{2} ) |
23.87 | 23.85(0.05) | 23.65(0.85) |

P(E_{3} ) |
30.27 | 30.25(0.06) | 30.31(0.92) |

P(E_{4} ) |
18.56 | 18.59(0.04) | 18.13(0.46) |

P(E_{5} ) |
11.24 | 11.21(0.02) | 11.30(0.47) |

P(E_{6} ) |
6.63 | 6.64(0.02) | 6.27(0.12) |

P(E_{7} ) |
3.84 | 3.85(0.01) | 3.63(0.07) |

P(E_{8} ) |
2.26 | 2.26(0.01) | 2.15(0.04) |

P(E_{9} ) |
1.34 | 1.34(0.00) | 1.27(0.02) |

CPU(s) | --- | 1.81 | 2.36 |

**Table 1:** Comparison of SAMC and Pop-SAMC for the multimodal example. The
number in the parentheses is the standard deviation of the corresponding estimate.
CPU: the CPU time (in seconds) cost by a single run of the corresponding algorithm
on a Intel Core 2 Duo 3.0 GHz computer.

A Bayesian approach to model selection problems proceeds as follows. Suppose that we have a posterior distribution denoted by where y denotes the data, m is the model index, which belongs to a set of competing models, m∈M and P(m) is the prior probability of model m, f(y/m) is the marginal likelihood, which is obtained by integrating out the model parameters. Different methods have been developed to estimate the posterior probability of all potential models. The criteria to judge each method is based on the accuracy of their estimation.

SAMC has been compared with RJMCMC in Bayesian model selection problem [2]. The results show that SAMC outperforms RJMCMC when the model space is complex. However, when the model space is simple, e.g., it only contains several models with comparable probability, SAMC may not be better than RJMCMC. On the other hand, compared with SAMC, Pop-SAMC has shown its superiority in estimating sample space partition weight. For model selection problems, the sample space is usually partitioned according to the model index by assuming that the MH chain can mix reasonably well in the sample space of each model. The weight of each partition is thus proportional to the posterior probability P(m/y) of each potential model. Accordingly, Pop-SAMC is expected to be more efficient in estimating the model probability than SAMC. In this section, we show the superiority of Pop-SAMC in Bayesian model selection problems by studying two typical examples. The results show that Pop-SAMC can make a significant improvement over SAMC and it can also work better than RJMCMC even when the model space is simple.

**A change-point identification example**

The change-point identification problem can be described as follows. Suppose we have a sequence of independent observations y = (y_{1},y_{2},…,y_{n})’ and they can be partitioned into blocks, such that the sequence follows the same distribution within blocks. Our goal is to identify the unknown number and the locations of the boundary, called change-point, between blocks. For simplicity, we assume that the observations within each block are drawn independently from a normal distribution , where b is the index of blocks. After a change-point, both the mean and variance may shift.

In the literature, this problem has been studied by several authors using simulation-based methods, e.g., the Gibbs sampler [14], reversible jump MCMC [1], jump diffusion [15], and evolutionary Monte Carlo [7]. In this paper, we follow [7]’s approach, a latent vecotor is introduced to indicate the change-point position. Let Z = (Z_{1},…,Z_{n-1}) be a latent binary vector associated with the observations index except for the last one, indicating the potential change-point, where Z_{i} =1 indicates a change-point, and 0 otherwise. Let Z^{(k)} correspond to a model with k change-point, with the unknown change-points being denoted by C_{1},…,C_{k}. For convenience, we let C_{0}=0 and C_{k+1}=n and they follow the order .

Under the above setting, we have for b =1,2,…,k+1 and i=1,…,n. For model Z^{(k)}, the parameter vector is . Let χ_{k} denote the model space with k change-points, Z^{(k)} ∈ χ_{k} and . The log-likelihood function of model θ_{(k)} is then

(14)

To conduct a Bayesian analysis for the model, we specify the following prior distribution for the model parameters:

(15)

where denotes an inverse Gamma distribution with hyperparameters a,b, and an improper uniform prior is put on each mi. In addition, we assume that the latent vector Z^{(k)} follows a truncated Poisson distribution,

(16)

Where λ is a hyperparameter; n-1 is the largest number of changepoints allowed by this model. Conditioning on the number of changepoints k, we put an equal prior probability on all possible configurations of Z^{(k)}. By assuming that all the priors are independent, the log-prior density is

(17)

where Combining the likelihood (14) and prior distributions (17), integrating out µ_{i} and σ^{2}_{i} for i = 1,...,k +1 and taking the logarithm, we get the following log-posterior density function

(18)

Samples generated from the above posterior distribution can be used to estimate P(χ_{k}/y). For Pop-SAMC, if we let and , it follows from (9) that forms a consistent estimator for the posterior probability ratio . Without loss of generality, we restrict our consideration to the models with , where k_{min} and k_{max} can be determined easily with a short pilot run of the above algorithm, the probability of those models outside this range is zero.

For the change-point identification problem, the details of the sampling step of Pop-SAMC are designed similarly to those described in [16], except for the weight updating step, which follows (9).

In this example, the simulated dataset consists of 1000 observations with and The time plot is shown in (**Figure 1**). For this example, we set the hyperparameters α = β= 0.5,which forms a vague prior for ; and set λ=1. After a short pilot run, we set k_{min} = 7 and k_{max} =14

Pop-SAMC was run for this example 50 times independently with the following setting:

N=20, T_{0}=10, Iterations = 5×10^{4} and The results are summarized in (**Figure 1**) and (**Table 2**). **Figure 1** shows the comparison between the eight true change-point pattern with its MAP (maximum a posteriori) estimate, which are (120, 210, 460, 530, 615, 710, 800, 950) and (120, 211, 460, 531, 610, 710, 801, 939) respectively. The two patterns match very well except for the last point. A detailed exploration of the simulated dataset gives a strong support to the MAP estimate. The last ten observations of the second to the last block have a larger mean value than the expected and thus, they have been grouped into the last block. The MAP estimates also achieves larger log-posterior probability than that of the true pattern, which is 5305.57>5300.24

Pop-SAMC 10%Cr | Pop-SAMC | SAMC | RJMCMC | |||||
---|---|---|---|---|---|---|---|---|

k | prob(%) | SD | prob(%) | SD | prob(%) | SD | porb(%) | SD |

7 | 0.1029 | 0.0014 | 0.1009 | 0.0018 | 0.0949 | 0.0026 | 0.0998 | 0.0052 |

8 | 55.5077 | 0.2272 | 55.6082 | 0.2698 | 54.5699 | 0.6833 | 55.0832 | 0.3261 |

9 | 33.3677 | 0.1364 | 33.2264 | 0.1693 | 33.5970 | 0.4432 | 33.5365 | 0.1794 |

10 | 9.2642 | 0.0873 | 9.3098 | 0.1010 | 9.8146 | 0.2910 | 9.4942 | 0.1548 |

11 | 1.5646 | 0.0253 | 1.5633 | 0.0233 | 1.7117 | 0.0778 | 1.5884 | 0.0547 |

12 | 0.1767 | 0.0037 | 0.1756 | 0.0031 | 0.1943 | 0.0113 | 0.1813 | 0.0108 |

13 | 0.0150 | 0.0004 | 0.0149 | 0.0003 | 0.0165 | 0.0011 | 0.0153 | 0.0013 |

14 | 0.0010 | 0.0000 | 0.0010 | 0.0000 | 0.0012 | 0.0001 | 0.0012 | 0.0002 |

CPU(s) | 16.1 | 16.2 | 16.2 | 15.2 |

**Table 2:** Comparison of the estimated posterior distribution for the change-point identification example. The number in the parentheses is the estimates of standard deviation
(SD). CPU: the CPU time (in seconds) cost by a single run of the corresponding algorithm on a Intel Core 2 Duo 3.0 GHz computer.

For comparison, SAMC and RJMCMC were also applied to this example. Each algorithm was run 50 times independently. The results are summarized in (**Table 2**). SAMC employs the same setting as Pop- SAMC except for two parameters, T_{o}=100 , Iterations=10^{6} . RJMCMC employs the same transition proposals as those used by Pop-SAMC and SAMC and performs 10^{6} iterations in each run. Under these settings, for a single run, each of the three algorithms performs exactly the same number of energy evaluations with the same transition proposals. Therefore, the comparisons made in (**Table 2**) are fair to each of the algorithm. The CPU time cost by each run is about the same for all methods

The comparison shows that Pop-SAMC works best among these three methods, with the smallest standard error achieved in estimating the posterior probability P(χ_{k}/y). As expected, RJMCMC works better than SAMC in this example. Because for this example, the model space is quite simple, containing only one mode with comparable probabilities. As previously mentioned, under such a situation, SAMC may not be better than RJMCMC. However, Pop-SAMC does. Although, Pop-SAMC is essentially an important sampling method as SAMC, its improved self-adjusting mechanism makes it much more efficient than SAMC. Amazingly, this improvement in its ability to learn from past samples enables Pop-SAMC to even conquer RJMCMC for those problems in which RJMCMC succeeds.

is worth pointing out that, both Pop-SAMC and SAMC beat RJMCMC in the low probability model spaces, e.g. k =7,13,14, even though SAMC is worse than RJMCMC overall. The reason is the following. RJMCMC does not have self-adjusting ability: it samples each model in a frequency proportional to its probability. In contrast, due to their self-adjusting mechanism, Pop-SAMC and SAMC sample equally from each model space, they work well for the low probability model space as well as for the high probability part.

Finally, in order to further increase Pop-SAMC’s efficiency and fully use the information among the population, we incorporate crossover operator into the computation with 10% crossover rate and keep all other settings intact. The algorithm was run 50 times independently, and the results were also included in (**Table 2**) for comparison. Unsurprisingly, Pop-SAMC works more efficiently with the crossover operator.

To have a further assessment of the performance of the Pop-SAMC in Bayesian model selection problems, we consider a linear regression variable selection example, in which the number of observations n is much less than the number of potential predictors p.

The linear regression model with a fixed number of potential predictors {x_{1},x_{2},…,x_{p}} takes the form

(19)

where is the response vector, is an n ×(p +1)design matrix, and is a (p+1) -vector of regression coefficients. The problem of interest is to find a subset model M_{k} of the form

(20)

which is best under some criterion, where are the selected predictors and is the vector of regression coefficients of the subset model. For model M_{k}, the likelihood function is

(21)

The prior distributions for each parameter are assigned as follows. We first assume and β_{k} are subject to the g priors [17],

(22)

where g is a hyperparameter. We further assume that all the p predictors are linearly independent, and each has the same prior probability q to be included in the model. Therefore, the prior probability imposed on the mode M_{k} is

(23)

with q being subject to the uniform distribution Unif [0,1].

Collecting the likelihood and prior distributions, we get the posterior distribution,

(24)

Integrating out t, bk and q from (24) and taking the logarithm, we get the log-posterior of model M_{k} (up to an additive constant),

(25)

where g is specified by the user, which reflects their prior knowledge on the model space. Typically, large g concentrates the prior on parsimonious models with a few large coefficients, and small g tends to concentrate the prior on saturated models with small coefficients [18]. The evaluation of the posterior distribution involves inverting a matrix, which can be calculated in a recursive manner using the matrix inversion in block form, and this will save the computation cost tremendously.

**Simulation study**

The small n large p example is modified from some examples studied in [19,20]. The dataset is generated as follows. Let for i = 1,…,600 and define

The response variable is defined as

(27)

where , and is independent of other predictor variables.

For this example, we set the hyperparameter the so called benchmark prior recommended by [19]. Given the full posterior distribution, in applying Pop-SAMC to this example, we follow the same fashion as in the change-point identification example. We first partition the sample space according to the model index k. the model space with k selected variables, and . It follows from (9) that orms a consistent estimator for the posterior probability ratio We restrict the model space to be After a pilot run, we set k_{min} = 10 and k_{max} =40.

Again, we applied four algorithms to this example and compare their efficiency. Each algorithm was run 20 times independently. Firstly, Pop-SAMC was run for this example with the following setting: N=20, T_{0}= 400, Iterations = 3× 10^{5} and Then, we include crossover operator into Pop-SAMC. The modified algorithm was run for this example with 10% crossover rate while keeping all other settings intact. Finally SAMC and RJMCMC were applied to this example respectively. SAMC employs the same setting as Pop-SAMC except for one parameter, Iterations = 6× 10^{6}. RJMCMC also performs 6× 10^{6} iterations in each run, with the first 50000 iterations being discarded for the burnin process. For all the four algorithms, the same transition proposal is used, and for a single run, each of them performs exactly the same number of energy evaluations. Therefore, the comparisons are fair to each of the algorithms. The computational results are summarized in (**Table 3**), those subregions with zero probability for all the algorithms are not listed.

Pop-SAMC 10%Cr | Pop-SAMC | SAMC | RJMCMC | |||||
---|---|---|---|---|---|---|---|---|

k | prob(%) | SD | prob(%) | SD | prob(%) | SD | porb(%) | SD |

10 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.2125 | 0.3689 |

11 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0012 | 0.0035 | 0.1345 | 0.2223 |

12 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0052 | 0.0135 | 0.1515 | 0.1636 |

13 | 0.0001 | 0.0002 | 0.0001 | 0.0002 | 2.0299 | 6.5153 | 45.6895 | 8.4369 |

14 | 0.0001 | 0.0002 | 0.0001 | 0.0001 | 1.4000 | 4.4203 | 38.0850 | 6.8908 |

15 | 0.0000 | 0.0000 | 0.0000 | 0.0001 | 0.2382 | 0.7520 | 7.3470 | 1.2674 |

16 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0245 | 0.0712 | 1.0135 | 0.2736 |

17 | 0.0000 | 0.0000 | 0.0000 | 0.0001 | 0.0082 | 0.0128 | 1.5905 | 5.0314 |

18 | 0.0000 | 0.0000 | 0.0000 | 0.0001 | 0.0135 | 0.0260 | 1.1135 | 1.5058 |

19 | 0.0000 | 0.0000 | 0.0000 | 0.0002 | 0.0059 | 0.0099 | 0.9490 | 1.0440 |

20 | 0.0000 | 0.0000 | 0.0000 | 0.0001 | 0.0021 | 0.0040 | 0.3740 | 0.5632 |

21 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0005 | 0.0011 | 0.0930 | 0.1583 |

22 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0001 | 0.0003 | 0.0190 | 0.0361 |

23 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0001 | 0.0001 | 0.0035 | 0.0114 |

24 | 0.0001 | 0.0002 | 0.0001 | 0.0003 | 0.0080 | 0.0177 | 0.0005 | 0.0022 |

25 | 0.0000 | 0.0001 | 0.0001 | 0.0001 | 0.0029 | 0.0068 | 0.0005 | 0.0022 |

26 | 0.0001 | 0.0001 | 0.0001 | 0.0000 | 0.0008 | 0.0016 | 0.0000 | 0.0000 |

27 | 83.5210 | 1.0919 | 84.1339 | 1.4828 | 81.0277 | 10.0109 | 2.6935 | 12.0457 |

28 | 14.9215 | 0.9560 | 14.3358 | 1.2640 | 13.7389 | 2.2726 | 0.4760 | 2.1287 |

29 | 1.4415 | 0.1639 | 1.4189 | 0.2248 | 1.3769 | 0.2886 | 0.0460 | 0.2057 |

30 | 0.1080 | 0.0211 | 0.1037 | 0.0188 | 0.1077 | 0.0272 | 0.0040 | 0.0179 |

31 | 0.0071 | 0.0018 | 0.0066 | 0.0014 | 0.0071 | 0.0022 | 0.0000 | 0.0000 |

32 | 0.0004 | 0.0002 | 0.0004 | 0.0001 | 0.0004 | 0.0002 | 0.0000 | 0.0000 |

**Table 3:** Comparison of the estimated posterior distribution *P(X _{k}| y )* for the simulated large

The comparison shows that the order of the four algorithm in terms of efficiency for this example is Pop-SAMC with 10% crossover > Pop- SAMC > SAMC> RJMCMC. This order is consistent with that in the change-point identification example, except SAMC beats RJMCMC this time. In fact, RJMCMC failed in this example, it chose the model M_{13} instead of the true model M_{27}. This is because there are two modes in the model space, which are well separated. For RJMCMC, it does not have the self-adjust ability, which makes it easily get trapped into a local mode. However, all the other three algorithms have a self-adjusting mechanism, which enables them to get out of local trap and explore the whole sample space quickly. The efficiency improvement for Pop- SAMC based algorithms over SAMC is also significant, especially at the true mode and low probability model space, e.g. k = 11:26.

We further check the estimates of the marginal inclusion probabilities produced by Pop-SAMC and RJMCMC in a single run, which is shown in (**Figure 2**). From this plot, we may tell that all the 27 variables selected by Pop-SAMC are in the true variables rang from 31 to 60. Due to the correlation among the variable set, variable 32, 33 and 34 were not selected. On the other hand, the variables selected by RJMCMC are also belong to the true variable set, but it only found 13 out of the 27.

**A Real data analysis**

The dataset we studied here was generated by [21]. As described in [22], the experiment concerns the genetic basis for differences between two inbred mouse populations (B6 and BTBR). Based on their in-house selective phenotyping algorithm, 60 (B6×BTBR) mice (29 males and 31 females) were selected. A total of 60 arrays were used to monitor the expression levels of 22,690 genes. Some physiological phenotypes were also measured by quantitative real-time RT-PCR, e.g. the numbers of stearoyl-CoA desaturase 1 (SCD1). The raw data are available in GEO (http://www.ncbi.nlm.nih.gov/geo; accession number GSE3330).

We treat the phenotypic value (SCD1) as the dependent variable, and the expression levels of genes as predictors. The value of SCD1 was first adjusted to remove the possible gender effects, then its correlation to each gene is calculated. We ordered the genes according to the correlation from high to low, and took the first 1000 genes as the potential predictors.

Three algorithms were applied to this dataset, Pop-SAMC, SAMC and RJMCMC. Each algorithm was run for this example 20 times independently. Follow the procedure in the simulation study, the sample space was partitioned according to the model index, and after a few pilot runs, we restricted the model space to be 1 ≤ k ≤ 10.In the pilot runs, we also found, with such a small number of observations, n = 60, setting g = p^{2} made the penalty to complex models so strong such that the mode was pushed around 1. In order to consider more potential models, we relaxed the penalty in priors and set g = p = 1000. The parameters for each algorithm were set as follows. For Pop-SAMC, N=10, T_{0}=20 , Iterations = 6 × 10^{4}, and SAMC employs the same setting as Pop-SAMC except T_{0}=50 and Iteration = 6 × 10^{5}; RJMCMC also performs 6 × 10^{5} iterations with the first 20000 iterations as warming. As in the simulation study, the same transition proposal is used for all the algorithms, and for a single run, each of them performs exactly the same number of energy evaluations. Therefore, the comparison is fair. The computational results are summarized in (**Table 4**).

Pop-SAMC | SAMC | RJMCMC | ||||
---|---|---|---|---|---|---|

k | prob(%) | SD | prob(%) | SD | porb(%) | SD |

1 | 0.2792 | 0.1049 | 0.1991 | 0.0948 | 0.1810 | 0.0907 |

2 | 16.0970 | 1.9719 | 14.6317 | 4.8154 | 13.6790 | 2.5625 |

3 | 20.2100 | 3.2758 | 18.2059 | 4.2074 | 18.0265 | 3.3172 |

4 | 35.5714 | 4.0382 | 36.0330 | 8.9530 | 39.0120 | 5.0084 |

5 | 16.9414 | 1.1698 | 17.0727 | 2.1761 | 17.8005 | 1.7256 |

6 | 6.7641 | 0.8092 | 7.0985 | 1.9672 | 6.5390 | 0.9450 |

7 | 2.5384 | 0.4956 | 3.3053 | 1.9966 | 2.5420 | 1.4040 |

8 | 0.9915 | 0.2542 | 1.9386 | 2.0487 | 1.3045 | 2.6034 |

9 | 0.4478 | 0.1443 | 1.1381 | 1.4715 | 0.7115 | 2.1479 |

10 | 0.1591 | 0.0853 | 0.3770 | 0.4761 | 0.2045 | 0.7313 |

**Table 4:** Comparison of the estimated posterior distribution *P(χ _{k}| y )* for the real
large linear regression example. The number in the parentheses is the estimates
of standard deviation (SD).

The results is clear, Pop-SAMC works best among the three methods with the smallest standard error achieved in estimating the posterior probability of potential models. Since there is only one mode in the model space for this dataset, as an important sampling algorithm, SAMC can not beat RJMCMC overall, it only won the battle in the low probability model space. However, with the improved self-adjusting mechanism, Pop-SAMC has conquered RJMCMC in the whole model space.

In this paper, we have proposed a Pop-SAMC algorithm, it works on a population of SAMC chains. Compared with single chain SAMC, the generalized algorithm can converge much faster. The superior efficiency of the new algorithm is demonstrated by Bayesian model selection problems. Our numerical results show that Pop-SAMC significantly outperforms both single chain SAMC and RJMCMC.

The success of the Pop-SAMC is based on two features. Firstly, the so-called population effect. Since it works with a population of SAMC chains, at each iteration, by integrating the information of all samples from each chain, it provides a more accurate estimation of P_{t}, therefore, the self-adjusting mechanism is more efficient. Secondly, running a population of chains in parallel enables incorporation of advanced operators, such as the crossover operator, snooker operator and gradient operator. With these population-based operators, the distributed information across a population at each iteration can be shared globally, which may further increase the algorithm’s efficiency. In addition, for each chain, Pop-SAMC keeps the self-learning feature of SAMC, which operates based on past samples.

There remain much improvement space and a broader field of applications for the new algorithm. For instances, how to combine the multiple-try method with each chain of Pop-SAMC to increase the accept probability in dealing with high dimension space problems; are there any guidance to determine the population size and crossover rate instead of the trial and error scheme used in this paper? Regarding the application of the new algorithm, a promising research field is to design a Pop-SAMC based particle filter.

The population setting of this algorithm provides good basis to develop advanced particle filter; moreover, Pop-SAMC has kept the two attractive feature of SAMC (i) superiority in sample space exploration and (ii) ability to generate weight-bounded importance samples. By taking advantage of these features, the notorious weight degeneracy problem encountered by traditional particle filter has a very good chance to be overcome. The authors are actively working on this field now.

Liang’s research was supported in part by the grant (DMS-1007457 and CMMI-0926803) made by the National Science Foundation and the award (KUS-C1-016-04) made by King Abdullah University of Science and Technology (KAUST). The authors thank the editor, the associate editor, and the referee for their comments which have led to significant improvement of this paper.

- Green P (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82: 711-732.
- Liang F, Liu C, Carroll RJ (2007) Stochastic approximation in Monte Carlo computation. J Am Statist Ass 102: 305-320.
- Gilks WR (1994) Adaptive direction sampling. The Statistician 43: 179-189.
- Liu JS, Liang F, Wong WH (2000) The use of multiple-try method and local optimization in Metropolis sampling. J Am Statist Ass 95: 121-134.
- Geyer CJ (1991) Markov chain Monte Carlo maximum likelihood. in Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface, (ed EM Keramigas), pp 153-163.
- Hukushima K, Nemoto K (1996) Exchange Monte Carlo method and application to spin glass simulations. J Phys Soc Jpn 65: 1604-1608.
- Liang F, Wong WH (2000) Evolutionary Monte Carlo sampling: applications to Cp model sampling and change-point problem. Statistica Sinica 10: 317-342.
- Liang F, Wong WH (2001) Real-parameter evolutionary sampling with applications in Bayesian Mixture Models. J Am Statist Ass 96: 653-666.
- Holland JH (1975) Adaptation in Natural and Artificial Systems. University of Michigan Press Ann Arbor.
- Chen HF (2002) Stochastic approximation and its applications, Boston: Kluwer Academic Publishers.
- Andrieu C, Moulines E, Priouret P (2005) Stability of stochastic approximation under verifiable conditions. SIAM J Control and Optimization 44: 283-312.
- Liang F, Liu C, Carroll, RJ (2010) Advanced Markov chain Monte Carlo: Learning from Past Samples. Wiley, ISBN: 978-0-470-74826-8.
- Roberts GO, Tweedie RL (1996) Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms. Biometrika 83: 95-110.
- Barry D, Hartigan JA (1993) A Bayesian analysis for change point problems. J Am Statist Ass 88: 421.
- Phillips DB, Smith AFM (1996) Bayesian model comparison via jump diffusions. in Markov Chain Monte Carlo in Practice (WR Gilks, S Richardson, DJ Spiegelhalter, eds.) 215-239 London: Chapman & Hall.
- Liang F (2009) Improving SAMC using smoothing methods: theory and applications to Bayesian model selection problems. The Annals of Statistics 37: 2626-2654.
- Zellner A (1986) On assessing prior distributions and Bayesian regression analysis with g-prior distributions. in Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, eds. PK Goel, A Zellner, Amsterdam: North-Holland/Elsevier, pp 233-243.
- George EI, Foster DP (2000) Calibration and empirical Bayes variable selection. Biometrika 87: 731-747.
- Fernández C, Ley E, Steel MFJ (2001) Benchmark priors for Bayesian model averaging. J of Econometrics 100: 381-427.
- Cai A, Tsay RS, Chen R (2007) Variable selection in linear regression with many predictors. Technical Report, University of Illinois at Chicago.
- Lan H, Chen M, Flowers JB, Yandell BS, Stapleton DS, et al. (2006) Combined expression trait correlations and expression quantitative trait locus mapping. PLoS Genetics 2: e6.
- Zhang D, Lin Y, Zhang M. (2009) Penalized orthogonal-components regression for large p small n data. Electron J of Statist 3: 781-796.

Select your language of interest to view the total content in your interested language

- Adomian Decomposition Method
- Algebra
- Algebraic Geometry
- Algorithm
- Analytical Geometry
- Applied Mathematics
- Artificial Intelligence Studies
- Axioms
- Balance Law
- Behaviometrics
- Big Data Analytics
- Big data
- Binary and Non-normal Continuous Data
- Binomial Regression
- Bioinformatics Modeling
- Biometrics
- Biostatistics methods
- Biostatistics: Current Trends
- Clinical Trail
- Cloud Computation
- Combinatorics
- Complex Analysis
- Computational Model
- Computational Sciences
- Computer Science
- Computer-aided design (CAD)
- Convection Diffusion Equations
- Cross-Covariance and Cross-Correlation
- Data Mining Current Research
- Deformations Theory
- Differential Equations
- Differential Transform Method
- Findings on Machine Learning
- Fourier Analysis
- Fuzzy Boundary Value
- Fuzzy Environments
- Fuzzy Quasi-Metric Space
- Genetic Linkage
- Geometry
- Hamilton Mechanics
- Harmonic Analysis
- Homological Algebra
- Homotopical Algebra
- Hypothesis Testing
- Integrated Analysis
- Integration
- Large-scale Survey Data
- Latin Squares
- Lie Algebra
- Lie Superalgebra
- Lie Theory
- Lie Triple Systems
- Loop Algebra
- Mathematical Modeling
- Matrix
- Microarray Studies
- Mixed Initial-boundary Value
- Molecular Modelling
- Multivariate-Normal Model
- Neural Network
- Noether's theorem
- Non rigid Image Registration
- Nonlinear Differential Equations
- Number Theory
- Numerical Solutions
- Operad Theory
- Physical Mathematics
- Quantum Group
- Quantum Mechanics
- Quantum electrodynamics
- Quasi-Group
- Quasilinear Hyperbolic Systems
- Regressions
- Relativity
- Representation theory
- Riemannian Geometry
- Robotics Research
- Robust Method
- Semi Analytical-Solution
- Sensitivity Analysis
- Smooth Complexities
- Soft Computing
- Soft biometrics
- Spatial Gaussian Markov Random Fields
- Statistical Methods
- Studies on Computational Biology
- Super Algebras
- Symmetric Spaces
- Systems Biology
- Theoretical Physics
- Theory of Mathematical Modeling
- Three Dimensional Steady State
- Topologies
- Topology
- mirror symmetry
- vector bundle

- Total views:
**12339** - [From(publication date):

specialissue-2013 - Jul 19, 2019] - Breakdown by view type
- HTML page views :
**8528** - PDF downloads :
**3811**

**Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals**

International Conferences 2019-20