Bayesian Analysis Using Power Priors with Application to Pediatric Quality of Care

Investigators conducting new research often have access to data from previous studies, and in such cases it is not only scientifically reasonable but also statistically advantageous to incorporate this information into the current analysis. Consider, for example, the common scenario in which a funding agency finances research incrementally, first requiring a small pilot or feasibility study before funding a more elaborate trial. In such cases, it can be beneficial to incorporate the pilot data into the subsequent analysis to increase the power to detect treatment effects. One strategy for synthesizing results across studies is through a Bayesian modeling approach. Because Bayesian methods can incorporate historical information through a prior distribution, they provide a natural framework for updating information across studies.


Introduction
Investigators conducting new research often have access to data from previous studies, and in such cases it is not only scientifically reasonable but also statistically advantageous to incorporate this information into the current analysis. Consider, for example, the common scenario in which a funding agency finances research incrementally, first requiring a small pilot or feasibility study before funding a more elaborate trial. In such cases, it can be beneficial to incorporate the pilot data into the subsequent analysis to increase the power to detect treatment effects. One strategy for synthesizing results across studies is through a Bayesian modeling approach. Because Bayesian methods can incorporate historical information through a prior distribution, they provide a natural framework for updating information across studies.
There are several advantages to a Bayesian analysis with informative priors elicited from historical data. First, informative priors can yield estimates consistent with accepted views of effects being studied [1]. For example, one might wish to restrict a model parameter to a reasonable range (e.g., a non-negative effect for smoking) to ensure a result consistent with established findings. Second, informative priors can improve the precision of estimates and increase the ability to detect treatment effects, even if the priors are only used to inform ancillary parameters. And finally, informative priors can be used in settings where diffuse priors may lead to computational difficulties or even improper posterior distributions [2].
While historical priors provide a useful analytic tool, there are times when investigators will want to limit the impact of historical data on the current analysis. For example, they may question the compatibility of the current and historical data, and may therefore wish to "downweight" the historical information to reduce its impact. Such downweighting may also be required by a regulatory agency, such as the US Food and Drug Administration (FDA), that wishes to limit the impact of historical information on the analysis of a new treatment.
Using an illustrative example, this paper explores common Bayesian approaches to incorporating historical information into regression models when there is uncertainty about the similarity between the current and historical studies. Our example involves a pair of studies designed to improve the delivery of care in pediatric clinics. New mothers were surveyed during baseline and follow-up periods and asked to rate their quality of care in a variety of areas; their responses were then aggregated into a binary measure representing high-or low-quality care. The aim was to evaluate the intervention effect comprehensively across both studies. However, while the two questionnaires overlapped substantially, they were not identical measures. To express our uncertainty about the compatibility of the two studies, we used the power prior methodology developed by Ibrahim and Chen [2] to mitigate the impact of the historical data in the event that the two studies differed in unobservable ways. The power prior introduces a parameter that explicitly controls the amount of weight assigned to the historical data. The approach has been applied to a variety of settings, including analyses involving cure rate models [3], environmental quality assessments [4], sample size determination [5], clinical trials [6], health-care evaluation [7] and cross-design meta-analysis [8,9].
In addition, we explore whether the data can determine how much weight to assign to the historical data, or whether this impact factor should be user-specified. We also propose an extension of the power prior to exchangeable hierarchical models. Here, we allow the covariate effects to differ across studies but assume that they share a common prior distribution. We then use the power prior to further attenuate the impact of the historical data in the event that the exchangeability assumption is itself too strong.
The remainder of this paper is organized as follows: Section 2 reviews common specifications of the power prior, explores strategies of assigning weights to the historical data, and describes the proposed extension to exchangeable hierarchical models; Section 3 provides background for the case study and presents analyses using the conventional and exchangeable power priors; and the final section summarizes the methods and offers guidelines for their use in practice.

The Power Prior
The conditional power prior Let D and D 0 denote the data from current and historical studies, respectively. Ibrahim and Chen [2] define the power prior distribution for a set of parameters  as where L(D 0 | ) denotes the likelihood for the historical study, a 0 is (for now) a fixed, known constant ranging from 0 and 1, g is the normalizing constant, and  0 ( ) the initial prior assigned to  before observing D 0 . In many cases,  0 () will be taken to be diffuse to re reflect lack of knowledge about  prior to observing the historical data. We refer to prior (2.1) as the conditional power prior, since it is formed by conditioning on both D 0 and the fixed parameter a 0 . The parameter a 0 governs the impact of the historical data on the current analysis, ranging from no influence when a 0 = 0 to parity when a 0 = 1, in which case (2.1) is just the usual posterior update of  based on D 0 . We note also that the power prior will be proper as long as the normalizing constant is finite.
where L(D|) denotes the current-data likelihood. Note that, at this stage of the analysis, a 0 and D 0 are treated as hyperparameters that impact  only through the power prior in (2.1). When a 0 = 1, the posterior for  based on both D and D 0 can be expressed as and hence assigning a 0 = 1 is tantamount to pooling the current and historical data and basing posterior inference on the aggregated data. Conversely, when a 0 = 0, the historical data are excluded altogether, and the prior for  reduces to the initial prior,  0 (). And, when 0 < a 0 < 1, the loglikelihood contributions of the historical subjects are downweighted by 100 x (1 -a 0 )%. The parameter a 0 can also be viewed as a precision parameter that inversely affects the heaviness of the tails of the prior for  (smaller a 0 values lead to heavier tails).
For generalized linear models, Ibrahim and Chen [2] recommend constructing the conditional power prior as where P(y oij |.) denotes the density function for y oij (j=1,…, n 0 ; i=1,…, n);  is a P×1 vector of fixed effect coefficients; b 0i is a N q (0,) vector of random effects for the i th historical cluster; x 0ij and z 0ij are fixed and random effect covariate vectors; and is a dispersion parameter (perhaps known). Here, the power prior is placed on the historical likelihood given the random effects, rather than on the marginal likelihood formed by first integrating out the subjectspecific effects. As Ibrahim and Chen [2] point out, conditioning on the random effects has several interpretive and computational advantages, including efficient implementation within a Markov chain Monte Carlo (MCMC) algorithm. We refer readers to section 4 of their paper for further details. Prior specification is completed by assuming an initial prior,  0 (), for , as well as prior distributions for the dispersion parameter  (if unknown) and the random-effect covariance matrix .

Joint power priors
The ibrahim-chen joint power prior: Within a Bayesian framework, it is natural to express uncertainty about the value of a 0 −and hence about the relevance of the historical data−by placing a prior distribution on a 0 and allowing the data to help determine its most likely value. Doing so defines a joint power prior, (, a 0 |D 0 ). Investigators have typically specified this joint power prior in one of two ways. Under the first approach, proposed in various articles by Ibrahim and Chen [e.g., 10,2,11], the joint prior is specified as where (a 0 ) denotes a prior for a 0 , such as a Be(, ) prior, with support on [0,1]. We refer to prior (2.4) as the Ibrahim-Chen (IC) joint power prior. Chen et al. [10] derive sufficient conditions for its propriety, noting that for normal models, prior (2.4) is always proper, while for a binomial model with a logit link,  must be greater than the rank of the design matrix to ensure propriety.
The corresponding joint posterior is given by However, while this posterior is well defined, our experience suggests that the mass of the posterior distribution coalesces near zero as the prior variance of a 0 increases, effectively excluding D 0 under diffuse priors. In Appendix A, we show that the posterior mode occurs at either a 0 = 0 or at a 0 = 1 under a U[0,1] prior.
(In most practical situations, the mode occurs at 0.) This result is evident in previous applications of the power prior, although, with the exception of Duan et al. [4], Duan and Ye [12] and more recently Neuenschwander et al. [13], it has not been explicitly noted in the literature. In fact, this tendency for the posterior mass to concentrate near zero holds even when D = D 0 , since a 0 is not affected by the commensurability between the current and historical data under the IC power prior.
To illustrate this last point, we conducted a small simulation study to examine how the posterior mean of   0 0 0 , , a E a y y changes as (a 0 ) becomes more diffuse. We considered a simple normal model with unknown mean,  and known variance  2 ; that is, , where y and 0 y , the current and historical sample means, are sufficient statistics for the data. We assumed a beta prior for a 0 with mean a 0 = 0.50 and a diffuse initial prior for ,  0 () 1. Under these conditions, the IC joint power prior for  and a 0 takes the form with  equal to  to ensure a prior mean for a 0 = 0:50. To explore the impact of sample size on the marginal posterior mean of a E a y y , we simulated 1, 2, 10 and 100 observations from a N(0,1) distribution and used these same observations for both D and D 0 (i.e., the two samples were identical). Below, we report the results based on a single simulated dataset from each of the four sample size settings. (As an informal check, we simulated datasets under different random-number generating seeds, and the results were similar in each case.) To examine the impact of increasing historical sample size, we inflated D 0 by factors of 1, 2 and 5 by replicating the current data an appropriate number of times.
Since the joint posterior   a y y does not have a closed form, we used the Bayesian software package WinBUGS [14] to conduct the simulations. We assigned a 0 a beta prior with fixed mean 0.50 and varying prior standard deviations ( 0  a ) that reflected increasing degrees of diffuseness. Specifically, we ran the model under 10 different choices for 0  a , ranging from 0 to 0.50, the upper bound for a beta random variable with mean 0.50. We ran three, initially-dispersed MCMC chains for 30,000 iterations each, discarding the first 10,000 as a burn-in. MCMC diagnostics, such as trace plots and Gelman-Rubin statistics [15], indicated rapid convergence and efficient mixing of the chains. , E a y y to increase as 0  a increased, regardless of the value of n. Only a highly informative prior distribution bounds the posterior mean of a 0 from 0. For most applications, this entails choosing 0  a small enough to avoid overly discounting D 0 . From a practical standpoint, this is essentially tantamount to assigning a 0 a fixed value.
The normalized power prior: To avoid potentially excessive attenuation of the historical data, Duan and Ye [12] defined a normalized power prior (NPP), which is formed by first normalizing the conditional power prior (2.1) and then multiplying this normalized density by an independent proper prior for a 0 : is finite for all a 0 A. In general, any proper distribution truncated to A can be used for (a 0 ). If the denominator in (2.7) is finite for all (a 0 )  [0, 1], then a natural choice for (a 0 ) is Be (,). For fixed a 0 , the NPP reduces to the conditional power prior given by expression (2.1).
While the NPP does not follow directly from Bayes' theorem, there is an intuitive rationale behind its construction: a) after observing D 0 , update  using Bayes' theorem (ignoring a 0 for the moment); b) to downweight D 0 , introduce a 0 and form the conditional power prior (2.1); c) to express uncertainty about a 0 , define a new joint power prior by first normalizing (2.1) with respect to  and multiplying this expression by an independent initial prior for a 0 ;  and finally, d) treat (2.7) as a new prior distribution for analysis of the current data, and update (, a 0 ) jointly using Bayes' theorem. Under the NPP, the joint posterior for (, a 0 ) is expressed as As Duan and Ye [12] note, unlike the IC prior, the NPP obeys the likelihood principle, since multiplying L(D 0 |  )by a factor K does not change the posterior. The NPP also has the virtue of "calibrating" the current and historical data so that more weight is given to D 0 when the two samples are similar. As the studies diverge, D 0 is increasingly attenuated. Likewise, as n 0 increases, D 0 is again downweighted, as evidence accumulates to suggest that the underlying populations differ. While this is an intuitively appealing feature-one would expect D 0 to be attenuated as it diverges from D -our experience indicates that unless the two studies are nearly identical, this attenuation tends to be quite excessive, obviating the need for the historical data for most realistically complex models.
Although the rate of convergence to zero is less precipitous than under the IC power prior, it is substantial enough to question the general applicability of the NPP.
To illustrate this point, we return to the normal model above. Assuming, as above, a beta prior for a 0 and a diffuse initial prior for , the joint NPP is given by a NPP a a a N y n Be a a y N y n d a N y n Be a n As we outline in Appendix B, the marginal posterior for a 0 is given by , , NPP a y y N y y Be a n a n Where    , E a y y as function of the 0  a for increasing effect sizes. Figure  4(a) presents results for the case where n = n 0 = 100, 0 y = 0 and y is allowed to range from 0 to 1.0. Figure 4(b) displays a similar graph for sample sizes n = 2,531 and n 0 = 895, which conform to the sample sizes for the case study presented later.
The results suggest that   0 0 , E a y y is influenced by both the effect size and the study sample sizes n and n 0 . As the discrepancy between y and 0 y increases, the historical data are substantially These results are not necessarily surprising, since simple z-tests suggest that the two studies comprise different population means when y  0.50 (or  0.25 for Figure 3(b)). We would therefore expect some attenuation of D 0 . That said, given that the current and historical samples will almost always differ in some respect, it seems inevitable that the NPP will discount D 0 to a large extent for reasonably sized studies, even if the underlying data-generating process is the same for the current and historical studies. In a sense, this is the reverse of the classic p-value conundrum, in which clinically inconsequential differences become statistically significant as n increases. Under the NPP, as the sample sizes increase, potentially minor differences between the sufficient statistics can lead to substantial mitigation of the historical data. We note, however, that our simulations did not consider the case where 2 0  and  2 diverged; future work might examine the behavior of the NPP as the current and historical variances diverge under a univariate normal model.
Because the NPP accentuates between-study differences as n and n 0 increase, clinicians must decide whether discrepancies between D and D 0 are clinically meaningful enough to warrant extensively downweighting D 0 . In many cases, it may be reasonable to override empirical differences between studies to avoid overly discounting D 0 . Therefore, as a practical alternative, we recommend conditioning on a 0 by assigning it a range of fixed values elicited from expert opinion about the commensurability of D and D 0 . We also recommend conducting "reference" analyses in which a 0 is set to 0 and 1. This conditional approach should prove especially useful for moderate to large studies, since both the IC and normalized power priors tend to assign little weight to the historical data in such situations. The conditional approach also extends naturally to complex regression models which differ due to departures in ancillary covariates (the case study presented in Section 3 is one such example). And finally, the conditional approach has the practical advantage of being straightforward to implement in standard Bayesian software, since it avoids the burdensome integration inherent in the NPP. (To date, the NPP has been applied only to basic regression models in which the posteriors have closed forms.) Thus, the conditional approach is appealing from both an inferential and a computational standpoint. For these reasons, we focus on the conditional power prior in the application described in Section 3.

Power priors for exchangeable hierarchical models
In the regression setting, investigators may wish to relax the assumption of identical fixed effects between the current and historical studies, and instead link the two studies using a hierarchical modeling structure. Under this scenario, the studies are allowed to have distinct fixed-effect parameters,  and  0 , which are assumed in turn to share a common prior distribution according to an exchangeability condition. This hierarchical modeling approach is commonly used in Bayesian meta-analysis as a flexible alternative to assuming identical fixed effects across studies [16,7].
There are times, however, when the exchangeability assumption itself is questionable, and in such cases one can use the power prior to limit the impact of D 0 . The conditional power prior can be adapted to accommodate the hierarchical nature of the model as follows where  0 (, 0 |) denotes the initial joint prior for  and  0 and  is a vector of hyperparameters. For regression models, a natural choice is to assign  and  0 conditionally independent multivariate normal distributions; that is,  0 (,  0 |) = N p (|,S) × N p ( 0 |μ,S), where N p (.|,S) denotes a p-variate normal distribution with mean  and covariance S. We refer to prior (2.11) as the exchangeable power prior (EPP) to distinguish it from the standard application of the power prior to models in which the treatment effect is fixed between studies. The conditional power prior given in expression (2.3) is a special case of the EPP in which (assuming a normal prior) S = 0 and  =  0 = . Strictly speaking, the EPP is a power prior applied to a hierarchical model with exchangeable cross-study effects; the power prior itself has no bearing on the validity of the exchangeability assumption. Typically, interest lies in estimating the current model parameters () while borrowing historical information in order to improve the precision of these inferences. However, depending on study aims, the focus could be on other parameters of interest, such as the grand mean .
Under the EPP, the models share only hyperparameters , and hence a 0 controls the impact of D 0 vis-a-vis these hyperparameters. When a 0 = 1, the D and D 0 contribute equal weight to the posterior for  (and consequently ). When a 0 < 1, D 0 contributes 100×(1 -a 0 ) % less information to the log-posterior for , and when a 0 = 0, the historical data are excluded altogether. Because the historical data directly affect only , we can expect it to have less impact on the posterior of  than the power prior in a non-hierarchical model.
The connection between hierarchical models and the power prior has been explored by Chen and Ibrahim [11]. They develop expressions for a 0 to calibrate the power prior to a corresponding hierarchical model. Our focus here is slightly different: we use the power prior in conjunction with an exchangeable hierarchical model to guard against unreasonably optimistic assumptions about the relevance of D 0 . We stress, however, that using the conditional power prior in this context is not tantamount to discounting the data twice, since the hierarchical structure and the power prior limit the influence of D 0 in different ways. The hierarchical structure evaluates the similarities of the data in terms of the adjusted mean outcomes: similar population means lead to reduced between-study variability and more pooling. The conditional power prior, on the other hand, downweights the log-likelihood contributions of D 0 by a (fixed) factor of a 0 regardless of the congruence between the studies.

Application: The Healthy Development Study Background
To illustrate the use of the conditional power prior, we analyzed a pair of studies designed to improve the delivery of preventive care in pediatric clinics. The "current" study, called the Healthy Development (HD) study, was an 18-month cohort study conducted from June 2004 to December 2005 [17]. Nineteen clinics from North Carolina and Vermont were given training sessions to improve their delivery of care in important clinical areas, such as child safety, maternal depression and substance-abuse. Seventeen non-randomized clinics served as a comparison group. Data were collected using the Promoting Healthy Development Survey (PHDS), a clinically validated measure of pediatric quality of care [18]. Parents were surveyed monthly and asked to rate their care in variety of areas. Due to low response during some months, the data were aggregated into a baseline period (June 2004-August 2004) and a follow-up period (June 2005-December 2005). To assess whether comprehensive, standardized care was provided across multiple dimensions of care, PHDS total scores were categorized into a binary outcome measure identifying patients who received "high quality" care. The primary aim was to determine whether intervention clinics showed greater improvement in delivering high quality care than comparison practices after adjusting for relevant patient and clinic covariates. A total of 2,536 surveys were used in the analysis.
As it happens, the investigators conducted a previous randomized quality-improvement trial, called the Partners in Practice (PIP) study [19]. The study consisted of 895 surveys in 26 centers (13 intervention and 13 control). As with the HD study, the primary outcome for the PIP study was a binary measure of quality of care derived from parent survey items. However, the PIP study focused more on healthy child environments (e.g., exposure to second-hand smoke) and somewhat less on maternal psychosocial concerns as in the HD study. Thus, while the two study surveys measured overlapping areas of care, they were not identical measures.
To address our concerns about the compatibility of the two studies, we used a conditional power prior distribution to incorporate the PIP study data into the analysis of the HD study. We present results from this analysis in Section 3.3. As a comparison, we also applied the EPP approach described in Section 2.3; we present these results in Section 3.4.

Separate analysis of current and historical studies
We begin with separate analyses of the HD and PIP studies. Figure 5 displays the unadjusted percent of patients receiving high quality care in each center, by study group and period, for the PIP randomized trial and the HD cohort study. Highlighted in bold are the average percents across centers, weighted by center sample size. Since the plots indicated extensive between-clinic heterogeneity, both at baseline and over time, we proposed the following random effects model for the HD study: where Y ij takes the value 1 if the j th patient in the i th pediatric clinic reported that high-quality care was received, and 0 otherwise; x ij is a 6×1 vector of fixed-effect covariates comprising an intercept term, an indicator for study group (1=intervention, 0=comparison), an indicator for study period (1 if follow-up, 0 if baseline), the period×group interaction, and two dichotomous patient-level covariates (mother's education, coded as 1=high school graduate, 0=non-graduate, and race, coded as 0=white, 1=other);  is a 6×1 vector of fixed-effect regression coefficients; b i N 2 (0,) is a 2×1 vector comprising clinic-specific intercept and slope parameters; and z ij =(1, study period ij ). As a quick check of model fit, we ran a fixedeffects logistic regression and computed the Hosmer-Lemeshow goodness of fit statistic; the p-value was 0.79, suggesting no lack of fit.
We proposed a similar regression model for the PIP study:   where  0 is a vector of fixed-effect regression coefficients, b 0i ~ N(0,  0 ) is a vector comprising random intercept and slope parameters specific to the i th clinic in the PIP study, and x 0ij and z 0ij are covariate vectors defined as in (3.1).
For both models, the intervention effect was measured by the period×group interaction term, denoted as  4 for the HD model and  04 for the PIP model. For  and  0 , we assumed diffuse N 6 (0,10 3 I 6 ) distributions, and for  and  0 we assumed IW(I 2 , 2) distributions. The analyses were run in WinBUGS [14]. We ran three, initially-dispersed MCMC chains for 110,000 iterations, discarding the first 10,000 as a burn-in. We retained every fiftieth draw to reduce autocorrelation. MCMC diagnostics, such as trace plots and Gelman-Rubin statistics [15], indicated rapid convergence and efficient mixing of the chains. A similar MCMC procedure was used for each of the analyses presented below.
While the directions of the estimates were similar across the studies, there were a few key differences (Table 1). In the PIP study, parents in the control group were somewhat less likely to receive high quality care at follow-up than at baseline (row 2). Meanwhile, parents in the control arm of the HD study remained relatively unchanged in their probability of receiving such care. Interestingly, parents in the PIP intervention arm began the study substantially less likely to receive high quality care (posterior mean=-0.85, 95% CrI=[-1.63,-0.08]), even though clinics in this study were randomized to study group. On the other hand, parents in the HD intervention group were somewhat more likely to report receiving high quality care at baseline (posterior mean=0.20, 95% CrI=[-0.22, 0.61]). The period  group interaction terms, highlighted in row 4, were used to assess the intervention effects. For both studies, the intervention groups demonstrated more improvement than the comparison groups, although this effect was most notable in the PIP study. The posterior probabilities of a positive intervention effect were close to 1 for both studies (see table footnotes). There was also more between-clinic variability in the PIP study than in the HD study, with a negative covariance between the random intercept and slope parameters (last row). Conversely, the HD study showed a positive random-effect covariance, although in both studies the 95% credible intervals overlapped 0.

Power prior analysis
Next, we used the conditional power prior approach described in Section 2.1 to borrow information from the PIP study while simultaneously expressing uncertainty about how much information should be shared. As above, we assumed random-effect logistic models for Y and Y 0 , except here we set  0 =  and  0 = , thus allowing the models to share identical covariate effects and variance components. We then placed the conditional power prior distribution in (2.4) on the fixed effect coefficients, . To complete the prior specification, we assumed a diffuse initial prior for  and an IW (I 2 , 2) distribution for  . Table 2 presents the results when a 0 = 1 which, together with the separate HD analysis presented in Table 1 (i.e., a 0 = 1), we considered to be our "reference" analysis. As expected, the estimates fall between the estimates for the two separate analyses (Section 3.2), reflecting the complete pooling of the two samples. There is substantial increase in the adjusted log-odds estimate for the interaction, from 0.57 in the separate HD analysis (Table 1, last column), to 0.89 for the pooled analysis. The random effect variance estimates also fall between those for the two separate analyses. In particular, the random effect covariance estimate is 0.06, indicating little association between the random intercept and slopes for the pooled model.
Next, we assumed a range of values for a 0 (a 0 = 0.25, 0.50 and 0.75). Table 3 presents the posterior summaries for the interaction term,  4 , under the different choices of a 0 . The entries in the first row are identical to the results for the separate HD study presented in Table 1, while the last row repeats the result from Table 2. As more weight is given to D 0 , there is a noticeable increase in the posterior means, reflecting the stronger intervention effect that was observed for the PIP study. When a 0  0.25, the credible intervals no longer overlap the null, strongly suggesting a benefit to the intervention.

EPP analysis
We next used the exchangeable power prior to jointly analyze the two studies. As in models (3.1) and (3.2), we allowed distinct fixed effect parameters,  and  0 , with joint prior distribution (, 0 ).
where  and  0 are 6 × 1 vectors of fixed-effect coefficients with prior (, 0 |) = N 6 (|,S)× N 6 ( o |μ,S), and S = Diag( 2  k ) for k = 1,…, 6. (Note that the EPP is expressed as a joint prior on all model parameters.) For the remaining parameters, we assumed the following prior distributions: Because our analysis consisted of only two trials, the posteriors of  and  0 are likely to be affected by the choice of prior for  k . Therefore, as a sensitivity check, we considered two priors: a diffuse uniform prior, which assigns most mass to large between-trial variances; and a half-normal with standard deviation 1, which assigns about 5% prior probability to  k > 2 (i.e., a priori, we assume similar effects for the two studies). Since the results were similar under both priors, we present only the results for the half-normal prior. Table 4 presents the results for the EPP analysis when a 0 = 1, which is equivalent to a hierarchical model based on complete pooling of the two datasets. As the table indicates, the impact of D 0 is further attenuated relative to the non-exchangeable power prior. For example, the posterior mean for  4 is 0.74 (95% CrI= [0.11, 1.43]), about halfway between the estimates for the separate HD analysis ( Table 1, last column) and the non-exchangeable model (Table 3). Meanwhile, the posterior estimates for education and race closely resemble those for the separate HD analysis, suggesting that D 0 did not have much influence on them. Likewise, the random effect variance components are close to the values obtained in the HD analysis. This is not surprising given that only one historical study was used in this analysis. Table 5 provides the posterior estimates for the interaction parameter,  4 , under various choices of a 0 . The posterior estimates increase steadily as more weight is assigned to D 0 : from 0.57 when a 0 = 0 to 0.74 when a 0 = 1. However, with the exception of a 0 = 0.25, the posterior estimates are smaller than under the standard power prior ( Table 3), confirming that the EPP provides an additional attenuation of D 0 relative to the conventional approach. The credible intervals are also consistently wider under the EPP approach, reflecting the added heterogeneity imposed when the assumption of shared model parameters is relaxed. Despite this increased variability, when a 0  0.25, the credible intervals no longer overlap 0.

Discussion
Our aim has been to examine approaches to incorporating data from prior studies into a current analysis when there is uncertainty as to the similarities of the studies. We have emphasized the use of power prior distributions for this purpose. The power prior has gained increasing attention in recent years as a technique for synthesizing results across disparate studies. It avoids having to arbitrarily select observations in order to limit the impact of D 0 instead, all historical observations are included in the analysis, but each is downweighted equally. For normal models, this is equivalent to inflating the posterior variance of the model parameters by 1/a 0 for 0 < a 0  1. However, the power prior will have a different impact on other densities, and therefore it cannot be viewed as a simple generalization of the variance-inflation approach to downweighting prior information.
In previous studies, investigators have recommended placing a prior distribution on the power parameter a 0 . This approach is appealing from a Bayesian perspective, since it allows one to express prior uncertainty about a 0 and use the data to estimate its posterior. We considered two specifications of the power prior in which a 0 was treated as a stochastic parameter: the Ibrahim-Chen (IC) power prior, which normalizes jointly over  and a 0 and the normalized power prior (NPP), which first normalizes with respect to  (conditional on a 0 ) and then multiplies this normalized density by (a 0 ). Under both approaches, for any sizeable n, the posterior of a 0 concentrates near zero as the prior variance for a 0 increases. Our investigation revealed that the IC prior penalizes the historical data even when the current and historical data are identical, unless a very informative prior is used to bound a 0 away from zero. In contrast, the NPP provides a measure of commensurability between the studies, so that D 0 is substantially downweighted only as the studies diverge. As such, the NPP is a sensible methodological extension to the original power prior. Nevertheless, even under the NPP, D 0 will be heavily discounted when there are discrepancies between the sufficient statistics of D and D 0 . Moreover, because the NPP has been formally developed only for basic normal and binomial models, it is not clear how it would perform in more complex settings. For example, in a multiple regression model, would the NPP downweight D 0 when auxiliary variables differed but the primary treatment effects were identical across studies? This is an open area of research. In addition, on a final practical note, the NPP is computationally challenging to implement due the integration inherent in its construction, requiring application of specific numerical integration methods or accurate approximations (e.g., the Laplace approximation) for evaluation. See Gajewski [20] and Neuenschwander et al. [21] for related comments on the computational challenges associated with the NPP.
An alternative strategy-the one we have adopted in this paper-is to assign a 0 a range of fixed values from 0 to 1 as part of a sensitivity analysis. This approach allows users more control over the impact of D 0 and is straightforward to implement in standard software packages such as WinBUGS. The trade-off is that the fixed approach requires external information, such as expert opinion, to determine a suitable range for a 0 . If, for example, investigators question the commensurability of the studies, restricting a 0 to a "skeptical" range (e.g., a 0  0.50) may be reasonable. On the other hand, if there is strong belief in the compatibility of the studies, a more "optimistic" range, such as 0.50 to1, can be selected. In this case, a 0 = 0.50 could serve as the lower reference value. For our study, the credible intervals for  4 did not overlap 0 when a 0  0.25. Thus, as long as investigators expressed some optimism about the studies' compatibility, we would infer a positive intervention effect. Conversely, if investigators expressed skepticism about the relevance of D 0 , so that the value a 0 = 0 could not be ruled out, then we might exercise more caution when evaluating the efficacy of the intervention.
More generally, one could take a model averaging approach and assign a distribution, (a 0 ), to a 0 , fit the model conditional on specific values for a 0 and then average inferences across (a 0 ) to obtain summary values. In the HD application, we effectively took (a 0 ) to be a discrete uniform distribution on {0; 0.25; 0.50; 0.75; 1}, although we did not perform model averaging, opting instead to tabulate the results separately for each value of a 0 . In a sense, the IC and NPP priors are attempting to achieve this model averaging in one step by placing a joint prior on (a 0 ,) and jointly updating the parameters. However, as we have seen, these priors tend to lead to rapid attenuation of D 0 , which in many practical situations might be considered excessive.
Both the NPP and the conditional power prior are valid approaches depending on study aims. If the aim is to incorporate historical information only when D  D 0 , then the NPP may be a reasonable choice. The fixed approach, on the other hand, is appealing if one wishes to incorporate historical information to aid inferences about the current model, or to provide a measure of the comprehensive treatment effect across studies, even when D  D 0 , as long as external information suggests that the historical data are relevant the current study. The important point, from our perspective, is to understand the advantages and limitations to each approach, and to make an informed decision according to study aims.
We have also applied the power prior in the context of Bayesian hierarchical models that link the current and historical studies through an exchangeability condition. Here, we allocated unique parameters to the current and historical likelihoods and assumed a common prior distribution. We then used the conditional power prior to limit the contribution of D 0 to the posterior updates of the hyperparameters that link the two models. This exchangeable power prior, as we have called it, can prove useful if one has reservations about the underlying assumption of exchangeability between studies, or if there is some additional skepticism about the compatibility of the studies. The results from our case study suggest that this approach provides additional attenuation of D 0 over and above that obtained when the (conditional) power prior is applied to a non-hierarchical model. Since our example only involved two studies, inferences about between-trial parameters such as  are only weakly informed. However, because our focus was on , the small number of historical studies was not a major concern.
In general, the methods described here can be extended to multiple historical studies, and may provide guidance for crossdesign evidence synthesis using hierarchical models. They may also prove useful to regulatory agencies, such as the FDA, who wish to incorporate historical data when evaluating the efficacy of new medical devices and therapies. Meanwhile, future research might focus on applying the NPP to hierarchical regression models, such as the one described in our case study, to examine its behavior in complex settings.