Prior Elicitation in Bayesian Quantile Regression for Longitudinal Data

In this paper, we introduce Bayesian quantile regression for longitudinal data in terms of informative priors and Gibbs sampling. We develop methods for eliciting prior distribution to incorporate historical data gathered from similar previous studies. The methods can be used either with no prior data or with complete prior data. The advantage of the methods is that the prior distribution is changing automatically when we change the quantile. We propose Gibbs sampling methods which are computationally efficient and easy to implement. The methods are illustrated with both simulation and real data. Citation: Alhamzawi R, Yu K, Pan J (2011) Prior Elicitation in Bayesian Quantile Regression for Longitudinal Data. J Biomet Biostat 2:115. doi:10.4172/2155-6180.1000115 Volume 2 • Issue 3 • 1000115 J Biomet Biostat ISSN:2155-6180 JBMBS, an open access journal Page 2 of 7 Let 1 = ( ,..., ) , i i ini y y y ′ 1 = ( ' ,..., ' ) N y y y ′ and 1 = ( ,..., ) N b b b ′ ′ ′ . The complete data density of ( , ), y b for =1,2,..., , i N is then given by ( , | , , ) = ( | , , ) ( | ) p p f y b f y b f b β σ β σ Λ Λ (3) There are several attractive properties of (3). First, it takes within subject correlation into account while allowing each individual to have a unique correlation subject. Second, it provides us a degree of shrinkage of the subject-specific regression lines toward the population line. Finally, a nice property of the asymmetric laplace distributions is that it can be represented as a scale mixture of normal distributions [7-10]. Our interest lies in the likelihood function of y given , , p β σ and Λ. To induce the correlation structure on the responses, we integrate out the random effects ( | , , ) = ( , | , , ) , p p Nq R f y f y b db β σ β σ Λ Λ ∫ (4) =1 =1 = ( ( | , , ) ( | ) ), n N i ij p i i i q R i j f y b f b db β σ Λ ∏ ∏ ∫


Introduction
Quantile regression models have been widely used for a variety of application [1,2]. It has attracted much interest in recent years because of the flexibility of quantile regression for modelling data with heterogeneous conditional distributions. In addition, a set of quantiles of the response variable (such as the first quartile, the median and the third quartile) may depend on the explanatory variables very differently from the center. Thus, a set of quantiles may give a more complete picture of the relation between the explanatory variables and the response variable than the mean regression. Furthermore, quantile regression makes very minimal assumptions on the error term distribution and thus its estimators may be more robust than the mean regression when the error term is non normal. Consequently, quantile regression has emerged as a useful supplement to standard mean regression models.
One of the serious challenges in quantile regression lies in analysis of longitudinal data in which repeated measurements are made on the same subject over time as well as in specification of quantile dependent prior distributions in quantile regression. There exists little literature for quantile regression in longitudinal data and we refer to [3][4][5][6].
This paper considers Bayesian quantile regression with random effects. We develop methods for eliciting prior distribution to incorporate historical data gathered from similar previous studies. The methods can be used either with no prior data or with complete prior data. The advantage of the methods is that the prior distribution is changing automatically when we change the quantile as well as precision. In addition, we propose Gibbs sampler for Bayesian quantile regression with random effects which is computationally efficient and easy to implement compared with expectation maximization algorithm proposed by [4] and Bayesian MCMC method proposed by [6].
The rest of this article is organized as follows. Section 2 introduces asymmetric Laplace as scale mixtures of normal distributions, we elicit power prior distribution, and describe Gibbs sampler (GS) for Bayesian quantile regression. In Section 3, we illustrate the Gibbs sampler by analyzing simulated data and compare our results with Bayesian MCMC and EM algorithm. Section 4 analyzes an age-related macular degeneration data set. We conclude with a brief discussion in Section 5.

Random effects model for longitudinal data
Suppose there are N subjects under study so that y ij denote the j th measurement on the i th subject, for i=1,..,N and j=1,...,n i . We start with the following latent regression model: = , =1,..., , =1,..., , where ij x′ and ij z′ are rows of the i X and i Z matrices, i X is ( 1) i n k × + and i Z is i n q × , β and i b are (k+1) and q -dimensional unknown parameters and random effects, respectively, and ij ε is the error term. We define the linear mixed quantile functions of the response y ij . | ( | , ) = , =1,..., , =1,..., , is the inverse of the cumulative distribution function of y ij given a vector of unknown subject-specific random effects i b Λ is a symmetric nonsingular matrix, and the p th quantile of the response y ij . We assume the conditional distribution of y ij given i b , for =1,..., i j n and =1,..., , i N is an independent distribution according to the asymmetric laplace distribution, so that (1 ) where > 0 σ is a scale parameter, and . The parameter p determines the skewness of the distribution and the p th quantile of this distribution is zero.
The complete data density of ( , ), y b for =1,2,..., , i N is then given by There are several attractive properties of (3). First, it takes within subject correlation into account while allowing each individual to have a unique correlation subject. Second, it provides us a degree of shrinkage of the subject-specific regression lines toward the population line. Finally, a nice property of the asymmetric laplace distributions is that it can be represented as a scale mixture of normal distributions [7][8][9][10].
Our interest lies in the likelihood function of y given , , p β σ and Λ.
To induce the correlation structure on the responses, we integrate out the random effects ( | , , ) = ( , | , , ) , where Nq R and q R denote the Nq and q-dimensional Euclidean space, respectively.
Model (2.5) is similar to the 2 l -penalized check function proposed by [6] which extends the random intercept model proposed by [4] to a very general case.

Mixture representation
This section introduces Gibbs sampler as procedure to estimate the parameters of interest. We adopt a full Bayesian approach to quantile regression for longitudinal data. Consider the linear mixed quantile functions of the response y ij (1), where the error term ij ε has asymmetric Laplace distribution with the p th quantile equal to zero.
Recently, [7][8][9][10] proved that the asymmetric Laplace distribution can be viewed as a mixture of an exponential and a scaled normal distribution. This can be recognized in the following lemma.
Lemma Suppose ij ξ is a standard normal random variable, ij t is a standard exponential random variable and ij ε is a random variable follows the asymmetric laplace distribution with density σ to avoid the scale parameter σ appearing in the conditional mean of y ij [10].

Power prior distributions and Gibbs sampler
In this section, we address a quantile dependent prior in Bayesian quantile regression for longitudinal data. Since [11]. Bayesian inference quantile regression has attracted a lot of attention in literature including [4][5][6][7]9,10,[12][13][14][15][16][17][18][19][20][21]. However, almost all these models set priors independent of the values of quantiles, or the prior is the same for modelling different quantiles [27]. This approach may result in an inflexibility in quantile modelling. For example, a 95% quantile regression model should have different parameter values from the median quantile, and thus the priors used for modelling the quantiles should be different [22]. In this paper, we address a quantile dependent prior for quantile mixed model. Our idea is to set priors based on historical data. The power prior by [23] is one of several methods to incorporate historical data into the analysis of currnt data. We adapt this prior distribution to be used in quantile regression.
Suppose there exists one historical data from a previous study. Let 0ij y be the j th measurement x′ and 0ij z′ are rows of the 0i X and 0i Z matrices, where for the i th subject given a vector of unknown subject specific effects 0i b , and 0 0 1 a ≤ ≤ . There are several attractive properties of (7). First, the prior distribution (7) is dependint on the quantile. Second, it provides an easy way to construct Gibbs sampler via the mixture representation. We assume a 0 as known parameter. The power parameter a 0 represents how much data from the previous study should be used in the current study. The main role of a 0 is that a 0 controls the influence of the data gathered from previous studies that is similar to the current study. Such control is important when the sample size of the current data is quite different from the sample size of historical data [23]. Chen et al. [24] choose a 0 as random to allow the flexibility in expressing our uncertainty about the power parameter via a prior mean and variance. The prior mean and variance is determined by the investigator. Recently [25,26] recognized that using a 0 as random is inappropriate.
Neuenschwander et al. [25] shown that elicitation of specific values of a 0 can be done via expert opinions, or via a meta-analytic. Neelon et al. [26] recommend choosing a 0 based on expert opinion about the commensurability of current and historical studies. The authors assign the power parameter a range of fixed values as part of a sensitivity analysis. If, for example, investigators are concerned about the similarity between the studies, they may choose 0 0.50 a ≤ . On the other hand, if there is strong belief about the similarity between the studies, they may choose 0 > 0.50 a . They also recommended conducting reference analyses in which a 0 is set to 0 as lower bound and 1 as upper pound. In this paper, following [26] we take a range of values for a 0 between 0 and 1 based on expert opinion about the commensurability of current and historical studies.
In this paper, we assume We see that (8) where = ( , , , ) D N y X Z represent the current study and Incorporating historical information into the analysis of new information through a prior distribution provides a natural framework for updating information across studies [26]. The most common way of combining the historical information into the analysis of new information is through hierarchical modeling. There are times, however, when investigators want to control the influence of the historical information on the new information. In this paper, we use the power prior distribution, because it introduces a power parameter that explicitly controls the amount of weight assigned to the historical data. Such control is important when the sample size of the current data is quite different from the sample size of the historical data or where there is heterogeneity between two studies [23]. In addition, this prior has an attractive property in Bayesian quantile regression as it is dependent upon the quantile level [27].
Gibbs sampler by [28] is very popular method for constructing a Markov chain in Bayesian inference, and it is used to generate a sequence of samples from the full conditional distribution. We use Gibbs sampling in Bayesian quantile regression to estimate the parameters of interest from our mixture representation. Thus, the fully conditional posterior distributions of all unknown parameters are needed and each of these distributions can be obtained by regarding all other parameters in (9) as known. The efficient Gibbs sampler works as follows: 1. Fix the value of p and a 0 so that the p th quantile and the weight of the historical data are modelled. and where the probability density function of Generalized Inverse Gaussian (r, f 2 , d 2 ) is given by

Some extensions
Using the power prior distributions depend on the availability of the historical data. In the previous part we elicited power prior distribution from one historical data and this prior can be easily generalized to multiple historical data. To generalize the power prior (8)  v D a π β σ σ ρ where 0ijk y and 0ik b denote the j th measurement on the i th subject and vector of unknown subject specific random effects for the k th study. On the other hand, sometimes the historical data is not available and in this case we put a 0 , the historical data is excluded altogether, and the prior (8) reduces to the initial prior. Finally, in case the historical data is not available and the current data is independent, the Bayesian quantile estimates by using prior distribution [8] are closed to [9] estimates, and there is code in R (MCMCquantreg) to obtain these estimates.

Simulation study
In this section, we compare the performance of the proposed Bayesian inference and EM algorithm used by [4] and Bayesian MCMC method proposed by [6]. We use the simulation random intercept model of [4]. Thus, a data set of N=100 subjects in which each subject had 23 scheduled longitudinal measurements was generated from the model. represent the intercept and the slope for the historical study. Like Geraci and Bottai (2007), we simulated the error term from three different distributions: the standard normal, the Student's t distribution with three degrees of freedom, and the chi-square with 3 degrees of freedom. We use initial prior N(0,10 6 ) for each regression parameters, 3 3 (10 ,10 ) − − Γ for the scale and precision parameter. We estimate the parameters of interest by using our Gibbs sampler and we simulated 1000 replications for each distribution for the error term. We run our Gibbs sampler for 5000 iterations with an initial burn-in of 1000 iterations. We conduct sensitive analyses with respect to three different choices for a 0 . We assumed a range of values for a 0 (a 0 = 0, 0.50 and 0.95). We compared the estimates of parameters in terms of relative bias averaged and relative efficiency by different algorithm: Gibbs sampler (GS), EM algorithm and Bayesian MCMC. The relative bias averaged over the simulations and the relative efficiency for EM and Bayesian MCMC were calculated by, respectively,  The estimates of the relative bias averaged and relative efficiency for different error term distributions and quantile models are reported in Table 1.
Clearly, the relative biases due to the three approaches are more or less the same, while the GS yields positive biases more than the EM and Bayesian MCMC which have sometimes negative biases. However, in general, the absolute bias in our algorithm is very small compared with the absolute bias by EM and Bayesian MCMC. In addition, the relative efficiency shows that our algorithm is more efficient than EM and Bayesian MCMC. Moreover, we see that as the weight of historical data increases the bais becomes smaller and the efficiency increases too. Table 2 summaries the posterior means for β 0 and β 1 where the error is normal at two different quantiles, namely 25% and 50% with respect to five different choices for a 0 . We assumed a range of values for a 0 (a 0 = 0, 0.25, 0.50, 0.75 and 1). We see that as the weights for the historical data increases the posterior means for (0. 25) β and (0.50) β increase and the 95% credible intervals get narrower.

Analysis age-related macular degeneration data
We use quantile regression methods to analyze the Age-Related Macular Degeneration data (ARMD) previously analyzed by [30]. There are 203 patients were randomly selected from three cities (centers) in the United Kingdom to measure the treatment effects of teletherapy on the loss of vision associated with the progress of agerelated macular degeneration. The sample consists of 70 patients from London, 84 from Belfast and 49 from Southampton. Of which, 101 patients were randomly assigned to a treatment medication group and 102 to a control group.  The response variable, the change in Distance Visual Acuity (DVA), of each patients was measured four times over a two year period, on the 3th, 6th, 12th and 24th months [30]. The data set contains 84 male and 115 female. For the purposes of illustration, we use the female as the historical data, from which we will construct our prior, and we use the data for the male as the current data. Figure 1 displays the change in distance visual acuity for each center, by studying visit and group (treatment group or control group) for the male and female. Since the plots indicated extensive between groups heterogeneity, both at baseline and over visit, we used the linear mixed quantile model (1) to show how the distance visual acuity is affected by five covariates and an intercept term: the actual time of the visits of each patient on the 3th, 6th, 12th and 24th months, age of the patient, centre (Belfast, Southampton, and London), treatment (treatment group or control group) and the visit × group interaction (the readers may refer to [30] for some details about this experiment). We take  Table 3 summaries the relative bias for Bayesian MCMC and Gibbs sampler with several values of 0 a , including a 0 = 0 a . In addition, Table 3 summaries the relative Again, the relative biases due to both approaches are more or less the same, while the GS yields more positive biases than Bayesian MCMC which sometimes gives negative biases. Most noticeably, when = 0.50 p the absolute bias generated by GS for all parameters is less than the absolute bias generated by Bayesian MCMC and the gap in terms of mean squared error between GS and Bayesian MCMC decreased compared with first and third quartile. When estimating the first quartile, the loss of efficiency of the Bayesian MCMC, with respect to GS, was 118% for the intercept and more than this value for the slopes. It can be argued that the performance of Gibbs sampler is better than Bayesian MCMC may be due to the fact that Bayesian MCMC assumed improper prior distribution (uniform) for all regression and scale parameters. This reason becomes clear with informative prior, we see that as the weight of historical data increases the bias becomes smaller and the efficiency increases too.

Discussion
In this paper, we have introduced quantile regression for longitudinal data using the asymmetric Laplace distribution from a Bayesian point of view. The methods for eliciting prior distributions can be used either with no prior data or with complete prior data. The methods have been outlined for unknown parameters. We compared our Gibbs sampler with Bayesian MCMC and EM algorithm by using the relative bias averaged and the estimated relative efficiency. We have found that the bias in our algorithm is very small compare with the bias by Bayesian MCMC and EM algorithm. In addition, the relative efficiency showed that our algorithm is more efficiency than Bayesian MCMC and EM algorithm. Finally, we have showed that the behaviour of the power prior is clearly with different weights of the power parameter. Incorporating historical information gathered from similar previous studies into the analysis of current study through a prior distribution provides a natural framework for updating information to yield better results. In particular, the power prior can be used to incorporate the historical data for updating information across studies.  Table 3: Estimated bias and relative efficiency by using Gibbs sampler GS and Bayesian MCMC for the real data.