Covariate Modeling in Population PK/PD Models: an Open Problem

In the last decade PK/PD modeling has reached a new level of maturity, acceptance, and influence and is increasingly considered a central technology for improving both drug development and clinical therapy. Pharmacological effects can be measured at the biochemical, cellular, organ, individual, or clinical level. The integration of diverse pharmacological effects into a coherent picture of drug action is one of the goals of current research in modeling of PK/PD. Two communicating dynamic systems, viz., pharmacokinetics and pharmacodynamics, determine the response of a given individual to a particular drug. Pharmacokinetics (PK) includes drug absorption, distribution, biotransformation, and elimination (processes that involve receptors, transporters and enzymes). Pharmacodynamics (PD) includes the interaction of drug species with enzymes or receptors of a target cell (or bacteria or v.irus) and the often-complex chain of molecular and physiological events, which ultimately determine the physiological responses to the drug. The modeling of PK [1] has a long tradition, and the history of dynamic PK/PD models goes back (as far as we know) to the seminal paper of Segre [2], while the theoretical framework for PK/PD models is described in a number of publications [3-5]. General classes of PK/PD models, which allow a unified approach to modeling, and investigation of PK/PD systems, have been described in [6,7]. What complicates PKPD modeling is the fact that individuals differ in their responses to a given drug. These differences can be expressed as individual differences in PK and PD, which, in turn, can partially be explained by variation in such factors as age, gender, weight, or genetic components. The elucidation and estimation of such relationships is carried out using PK/PD data from samples of individuals from populations of interest. Population experiments are often conducted in a very different manner than those carried out on a small number of individuals where rich data sets can be collected, since the resources required to execute an experiment that fully quantifies each individual are prohibitive. As a result, non-linear hierarchical mixed-effects models are routinely used in attempts to identify PK/ PD models from a population study [8-10]. Determining the relations between the parameters of a population PKPD model and covariates is a major aim of population PK/PD modeling. The covariates can improve the estimates precision, explain part of the inter-individual variability, and may also increase the mechanistic interpretability of the model or generate hypothesis. An appropriate covariate might be selected according to its clinical importance, although this is very rare given the semi-empirical nature of PKPD models and limited knowledge about drug/covariate relationships, or, as it is the norm, identified from the statistical significance of its relation(s) with parameter(s).


Introduction
In the last decade PK/PD modeling has reached a new level of maturity, acceptance, and influence and is increasingly considered a central technology for improving both drug development and clinical therapy. Pharmacological effects can be measured at the biochemical, cellular, organ, individual, or clinical level. The integration of diverse pharmacological effects into a coherent picture of drug action is one of the goals of current research in modeling of PK/PD. Two communicating dynamic systems, viz., pharmacokinetics and pharmacodynamics, determine the response of a given individual to a particular drug. Pharmacokinetics (PK) includes drug absorption, distribution, biotransformation, and elimination (processes that involve receptors, transporters and enzymes). Pharmacodynamics (PD) includes the interaction of drug species with enzymes or receptors of a target cell (or bacteria or v.irus) and the often-complex chain of molecular and physiological events, which ultimately determine the physiological responses to the drug. The modeling of PK [1] has a long tradition, and the history of dynamic PK/PD models goes back (as far as we know) to the seminal paper of Segre [2], while the theoretical framework for PK/PD models is described in a number of publications [3][4][5]. General classes of PK/PD models, which allow a unified approach to modeling, and investigation of PK/PD systems, have been described in [6,7]. What complicates PKPD modeling is the fact that individuals differ in their responses to a given drug. These differences can be expressed as individual differences in PK and PD, which, in turn, can partially be explained by variation in such factors as age, gender, weight, or genetic components. The elucidation and estimation of such relationships is carried out using PK/PD data from samples of individuals from populations of interest. Population experiments are often conducted in a very different manner than those carried out on a small number of individuals where rich data sets can be collected, since the resources required to execute an experiment that fully quantifies each individual are prohibitive. As a result, non-linear hierarchical mixed-effects models are routinely used in attempts to identify PK/ PD models from a population study [8][9][10]. Determining the relations between the parameters of a population PKPD model and covariates is a major aim of population PK/PD modeling. The covariates can improve the estimates precision, explain part of the inter-individual variability, and may also increase the mechanistic interpretability of the model or generate hypothesis. An appropriate covariate might be selected according to its clinical importance, although this is very rare given the semi-empirical nature of PKPD models and limited knowledge about drug/covariate relationships, or, as it is the norm, identified from the statistical significance of its relation(s) with parameter(s).
The main purpose of this paper is to show out how the currently accepted paradigm for the inclusion of covariates in PKPD models constitutes an open problem, since the accepted model for the relationship parameters/covariate can easily generate intractably complex models selection problems. The paper is organized as follows: (i) briefly describe Bayesian methodology, that is at the core of the methodologies we investigate, (ii) describe the covariate selection problem associated with PKPD models, showing how the accepted paradigm that associates covariates to each of the parameters in the model, generates an extremely complex model selection problem, (iii) show, by means of simulations, how current approaches have difficulties in dealing with the dimensionality of the covariate model families, (iv) propose an alternative that reduces the dimensionality problem, (v) end with a short discussion.

Bayesian estimation
We briefly summarize the main concept and problem involved in the Bayesian approach to inference, see, e.g., [11] for details. The Bayesian approach is developed in the presence of observations Y whose value is initially uncertain and described though a probability distribution that depends on some parameters θ . The Bayesian approach incorporates this information into the analysis through a density ( ) θ p that represents the prior knowledge before the observations Y are collected. Inference is based on the probability distribution of θ after observing Y and is obtained through its posterior distribution using Bayes' theorem. In trans-dimensional models (TDM), the prior uncertainty regarding the model structure is acknowledged, and the model itself becomes a parameter. For example, in variable selection we wish to select the "best" subset of all predictor variables on which to regress the response variable; the number of selected variables and the variables themselves, which together define the model structure, are unknown parameters. These models are referred to as trans-dimensional because each possible model structure has a potentially distinct set of coefficients/parameters associated with it. In general, these sets will be of different sizes and so as we move from one model to another, the parameter space of interest changes dimension. Until relatively recently the main restriction of Bayesian inference has been that the computation of the posterior is analytically intractable other than for relatively simple cases. Besides the controversy about the appropriateness of incorporating historical information, ( ) θ p in the analysis and specifying a complex model for the data, the main restriction (a fatal flaw, for practical purposes) of this approach has been the computation of this posterior, which, other than for relatively simple cases, is analytically intractable. Computation of the posterior involves solving multi-dimensional integrals, and many computational methods have been devised to do so. In the last ten years a large body of work on stochastic methods has made possible the applications of the Bayesian approach to complex problems. Markov Chain Monte Carlo (MCMC) methods [12][13][14][15][16] opened the way to analysis of complex models. Coupled with the advent of MCMC methods for posterior computation, the development and application of Bayesian methods for model uncertainty [13,[17][18][19][20][21] have seen remarkable evolution over the past decade. Some of these methods have been applied to PK/PD population models in [22,23] as well as to models for individual responses [24,25]. In general the field has achieved a high level of maturity, and computing systems and methods are available for a great number of Bayesian problems, we mention in particular BUGS [26], PKBUGS and JUMP for reversible jump MCMC [27].

Model selection
Variable selection in, for example, linear regression [28] consists of a set of potential predictors { } i X for a response Y of interest: each model under consideration corresponds to a distinct subset of { } i X , and is of the form where, γ X is the design matrix whose columns correspond to the γ -th subset of covariates, γ { } i X , and γ θ is the vector of regression coefficients for the γ -th subset. In PK/PD modeling the predictors (covariates) are not directly related to the response, but instead to the parameters of the model [29]. This fact immediately complicates the modeling selection problem, because each parameter can be related to different covariates. Moreover, different models can be used to express the relationship between a parameter and covariates (for example, clearance might be related to creatinine clearance linearly, but volume might be related to weight non-linearly). An instance of a general PK/ PD covariates model is of the form Where, m is the number of structural parameters in the model, , m , the collection of sub-sets of covariates for each parameter, and is the collection of functions characterizing the relationship of the parameters to covariates. For simplicity we omit details on the specification of individual specific random effects in equation (2), we refer to, e.g., [25] for the, Bayesian, formulation of the mixed effect model.
Consider the simplest possible PKPD model a single compartment model, with two parameters: where ϑ 2 represents volume of distribution, clearance, D the drug dose and CrCl (Creatinine clearance). Now assume that in the clinical trial one has collected just four covariates, say Age, Weight, Height and Creatinine clearance (CrCl); assuming that only one covariate can influence a parameter the possible alternative models to consider are 4 for each parameter generating 4 4 16 × = possible combinations. If we relax the assumption that only one covariate can influence a parameter, then for each parameter we have 4 models with 1 covariate, 6 with 2, 4 with 3, and 1 with 4 covariates, that is 15 models, and correspondingly

Model selection
Obviously the number of models indicated by (5) is too large to attack the problem by brute force. Putting aside the problem of selection bias that such a rather absurd number of alternative models can generate (exaggerating the importance of covariates is all but guaranteed with such a number of alternatives [30]), the main concern in the PKPD literature has been in trying to identify a practical and reliable selection method. There are various statistical approaches to model selection, of which one of the most common is stepwise addition (covariates are added one at a time until a maximum, large, number allowed in the model is reached) followed by backward elimination (the model is "pruned" by removing one of the selected covariates at a time until the removal corresponds to a significant increase in the ( ) For m parameters the total number of models to consider, indexed as a family { } objective function value) [31]. The adaptation of the methodology to PKPD is generally carried out by introducing three simplifications.
Algorithm 1: (i) First, one considers one parameter at a time, adding covariates iteratively using stepwise addition.
(ii) Second, covariates are only added to the model if the addition corresponds to a significant drop in the objective function accordingly to a statistical selection criterion, e.g. [32].
(iii) Third, the backward elimination process is not carried out.
Adopting (i) reduces the total number of models to be considered The approach just described can easily be criticized. Simple stepwise addition/elimination is often ineffective even when used with much simpler statistical models. Using the PKPD variant just described it is rather obvious that whatever model is selected it would not represent the best possible one. However the approach has the advantage of making the intractable problem corresponding to equation (5)  Alternatives methods to stepwise addition have been proposed, see [33] for comparisons, and in addition a number of methods decoupling the search for covariates from the overall likelihood function attempt to mollify the intractability of the task, see [34] using generalized additive models, or [35] using cluster analysis. Here we consider a methodology that takes advantage of the algorithms and formalism of trans-dimensional models (TDM). Briefly, to define a TDM one starts from a family of possible models, { } to each model. This prior formulation induces a joint distribution over the data, parameters, and models. In effect, these priors serve to embed the various separate models within one large hierarchical mixture model. Conditioning on the data, Y yields the posterior model probabilities ( ) k p M Y through the marginal likelihood of k M , obtained using MCMC [36]. The priors ( ) θ k k p M and ( ) k p M provide an initial representation of model uncertainty, the model posteriors, provides a complete representation of post-data model uncertainty that can be used for a variety of inferences and decisions. The TDM framework, proposed in PKPD by [27,37], can help with the problems associated with addition/deletion, however as we will see, it does not solve the problems related to the high-dimensionality of the model space.

I. Simulation 1
We consider a simple model of the form of equation ( CrCl Weight (7) Figure 1 shows the posterior probabilities corresponding to the 64 instances of the TDM. The darker the square, the higher the posterior probability of the corresponding Cl and V model; an empty square indicates a posterior probability less than 0.01. Note the effectiveness of the methodology in detecting the combination that shows the correct relationship of Cl to CrCl and V and to Weight. The plots also reveal a second main motivation for the use of TDM: when the correlation between covariates increases posterior probabilities for models corresponding to such covariates increases. In these cases model averaging, might be a preferable alternative to selecting a unique "best" model.

II. Simulation 2
We use the same set up, but now the data are simulated using: ϑ θ θ θ ϑ θ θ θ = + + = + +   (6) Weight and CrCl are the individual weight and creatinine clearance, respectively. θ 1 =1, θ 2 =0.2, θ 3 =1, θ 4 =.2, that describes a linear relationship of volume (ϑ 2 ) and clearance (ϑ 1 ) with Weight and CrCl. We simulate a study with 100 individuals, and assume lognormal distribution for the individuals' parameters with 50% inter-subject variability; 10% proportional error model for intra-subject variability. We assume that c=8 covariates are collected (corresponding to Sex, Ethnicity, Genotype 1, Genotype 2, Weight, Height, Age, CrCl). To generate a simulated covariates data set we use a multivariate normal model with a single correlation parameter ρ . The correlation between the i-th and j-th covariate equals ρ − i j , and the variance of each covariate is 1. For implementation of the model we rely on the interface and set up developed by [27]. We consider two chains run a 25,000 iterations burn-in, and then a further 175,000 iterations from which we shall draw posterior inferences. The analysis involves one Markov chain, initialized corresponding to the model with no covariates included. possible models is 52 2 =2704. The analysis now involves two Markov chains, initialized corresponding to the model with no covariates included, and to the model with highest posterior probability when considering one covariate. As suggested in [38], we check convergence qualitatively by plotting 'model state' against iteration number for each Markov chain and then quantitatively by comparing tables of posterior model probabilities from each chain. In addition at the end of the runs we simply counted the number of visits for each covariate model. Generating a data set with ρ =0.8 and 100 subject, and using, as above, a total number of 200,000 iterations a relatively large number of models, approximately 12%, receives less than 5 visits, a number that we consider too low to guarantee a proper evaluation of those model (note that the Markov chain should not only visit the models, but also spend "enough time" in each one to estimate the parameter associating covariates to parameters correctly). Doubling the length of the chains did not improve the situation significantly, making it questionable if the TDM search is capable to fully explore model spaces of this dimension.
The simulation points out how relatively quickly the methodology can get in trouble. The approach seems to be rather unfeasible using a model as simple as (3), and it would seems all but quite intractable with just slightly more complicated PK models and a larger number of covariates.
An alternative to the use of a "brute force" TDM, where all possible models are candidates, is to adapt a sequential approach, where model with 1, 2, … covariates are considered sequentially, as spelled out by the following simple algorithm. corresponds to all the possible permutations with repetition of the models that can be obtained by adding 0 or 1 additional covariate to each parameter, starting from B j-1 . The approach has the advantage of bringing the dimension of the model space under control: at each step the dimension of the TDM is constant: only c models need to be considered for each parameter, and the total number of models for each step is p c .
When we apply the sequential approach to the data generated for simulation 2, we need to perform three steps (corresponding to 1, 2, 3, covariates, and 64 alternative covariate models for each step) before the Akaike criterion [32] indicates that the model with two covariates is selected to represent the data. The result indicates a cluster of models grouped around the correct quadruplet Sex, Weight, Height, CrCl (Equation (8)) thoroughly follows the correlation structure adopted in the simulation. For the simulated data set the model with highest posterior has Sex and Weight for Cl, and Weight and Sex for V.

Discussion
The main purpose of this paper is to point out how the currently accepted modeling paradigm for the inclusion of covariates in PKPD models, that is model (2), can easily generate intractably complex model selection problems. The examples shown in the paper illustrate the fundamental problems with the approach. Problems that we summarize below.
Equations (5) indicates how the total number of models can grow rather astronomically with the number of parameters in the model and the number of covariates. However that number is still an underestimate of the models that one might, in principle, want to consider. In the computation of the dimension of the space of covariate models we have not included (i) interactions between covariates, that is models of the form: ϑ θ θ θ θ Where Cov j indicates the j-th covariate. We have also omitted (ii) alternative models for the relationship between parameters and covariates. For example polynomial model of the form: Cov . In addition, to give a complete picture of the potential complexity involved, we (iii) strictly speaking did not consider PKPD models, but only the simplest possible PK model. Introducing a PD component adds parameters to the model, and therefore further adds to the number of covariate relationships that need to be elucidated. And finally, (iv), if anything PKPD models are getting more complex in the literature [5] and the complexity again translates into more parameters and even more possible relationships with covariates.
The current approach to "solve" the problem basically ignores it. A univariate stepwise addition is adopted, where covariates are attached to one parameter at a time (ALGORYTHM 1 above, and currently distributed software implementing the simple algorithm [39]). However univariate stepwise addition is an approach that is not only not guaranteed to identify the covariate model maximizing the likelihood given the data, but it is not even guaranteed to obtain the same result if a different order of the parameters is considered in the search. (For example, if in a model with, say, four parameters one add covariates to the 1 st , 2 nd , 3 rd and than 4 th parameter, the final result will be different if one chooses to add to the 2 nd , 3 rd , 4 th and than 1 st ).
Given the complexity of the problem it is questionable if slightly more sophisticated versions of stepwise addition could be more successful. Even for much simpler univariate problems stepwise addition, followed by backward elimination, is not guaranteed to obtain a global maximum, and a number of alternatives have been described [40,41] trying to solve its limitations. The methodology described in [27,37] for PKPD models is basically an adaptation of those approaches, and ALGORITHM 1 shows a further elaboration that attempts to mollify the dimensionality of the problem associated with the covariate model (2).
The culprit for this situation is model (2). Its formulation makes a univariate problem into a multivariate one and explodes the dimensionality of the problem geometrically with the number of parameters in the model. The solution to this might simply is to formulate the problem into a relationship of covariates with the response, not with the model parameters. For example, a representation of the form: Might prove to be as effective to incorporate the effect of covariates on PKPD resposes. The investigation of such models is of course beyond the scope of the present paper. This paper might instead help stimulating discussion on model (2), a model that can be easily formulated in theoretical terms but that has a degree of flexibility, and admits so many variants, that might be unacceptable from both a scientific and statistical point of view.