Received date: January 06, 2015; Accepted date: June 19, 2015; Published date: June 26, 2015
Citation: Szczesniak RD, Dan Li, Amin RS (2015) Semiparametric Mixed Models for Medical Monitoring Data: An Overview. J Biom Biostat 6: 234. doi:10.4172/2155-6180.1000234
Copyright: © 2015 Szczesniak RD, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are are credited.
Visit for more related articles at Journal of Biometrics & Biostatistics
The potential to characterize nonlinear progression over time is now possible in many health conditions due to advancements in medical monitoring and more frequent data collection. It is often of interest to investigate differences between experimental groups in a study or identify the onset of rapid changes in the response of interest using medical monitoring data; however, analytic challenges emerge. We review semiparametric mixed-modeling extensions that accommodate medical monitoring data. Throughout the review, we illustrate these extensions to the semiparametric mixed-model framework with an application to prospective clinical data obtained from 24-hour ambulatory blood pressure monitoring, where it is of interest to compare blood pressure patterns from children with obstructive sleep apnea to those arising from healthy controls.
Ambulatory blood pressure; Covariance models; Linear mixed models; Longitudinal data; Functional data analysis; Nonparametric regression; Obstructive sleep apnea; Penalized splines; Semiparametric regression; Serial correlation
AIC: Akaike’s Information Criterion; BARS: Bayesian Adaptive Regression Splines; DBP: Diastolic blood pressure; LMM: Linear Mixed Model; LRT: Likelihood Ratio Test; ML: Maximum Likelihood; OI: Obstructive Index; OSA: Obstructive Sleep Apnea; REML: Restricted Maximum Likelihood
Medical monitoring plays a major role in modern clinical studies, ranging from portable actigraphy devices worn by individual subjects  to at-home reporting of biologic markers . Simultaneously, there is growing interest in the prognostic utility of medical monitoring for clinical care of patients with chronic diseases and disorders . In addition to static variables collected on individuals, a longitudinal sequence of measurements is collected. Data arising from medical monitoring can have complex, time-related dependencies . The analysis of medical monitoring data focuses on the evaluation (and possibly prediction) of the individual’s time course. Although these longitudinal processes can involve failure-time data, such as disease onset combined with biomarker data , we restrict our overview to settings in which a single longitudinal process is observed.
Both early origins and more recent approaches to longitudinal data analysis share the challenge of fitting nonlinear trends to observed data over time. Notable examples include the use of the linear mixed model (LMM) to analyze longitudinal data , and more recent work has involved adaptive curve fitting . Semiparametric regression, a compromise of linear and nonlinear regression modeling, was thoroughly described by Ruppert et al. . This methodology typically refers to specifying nonparametric components for effects that vary over a continuum, such as time or space, and parametric effects for the remaining model structure (e.g. covariate effects and subject-related random effects or measurement error terms).
In this paper, we focus on the application of a particular type of semiparametric regressions, known as semiparametric mixed models, to medical monitoring data and interpretations that can be made using these models. Although semiparametric mixed modeling has gained popularity in many of the sciences, its use in medical monitoring is relatively new. We detail their practical applications and describe future work that may further increase their clinical utility. As an illustrative example, we analyze data previously described by Amin et al. [1,9]. The primary aim was to determine whether group-specific differences existed in the mean ambulatory blood pressure response, which was monitored over a 24 h period. One of the responses of interest was diastolic blood pressure (DBP), which was collected at half-hourly intervals on each subject. We focus on the following two experimental groups from this study. Subjects with an obstructive index (OI) above 5 events per hour were classified as having severe OSA; those subjects with less than 1 event per hour were classified as controls. The individual profiles are shown in Figure 1.
Figure 1: Observed diastolic blood pressure (DBP), on the log scale, beginning with sleep onset and ending after 24 hours. Clockwise from upper left: Individual profiles for subjects separately from 87 subjects in the severe obstructive sleep apnea (OSA) group (upper left) and 135 Control subjects (upper right) measured in the study; sample means for Control (solid line) and severe OSA (dashed line) groups at each half hour.
Throughout the literature, the term “semiparametric” refers to a variety of statistical approaches, which may correspond to likelihood formulations, for example, or other aspects of models . In this paper, we refer to a model that includes a nonparametric component for the time-varying effect and parametric component(s) for any other effect(s) in the model . We observe data on individuals over time, which yields subject-specific collections of data, often known as profiles.
Consider medical monitoring data that consists of a sequence generated by a longitudinal process , which is specific to each individual (i=1,…n) at points tij(j=1,…ni) along the continuum of time. If the process exhibits sharp changes or nonlinear trends over time, it may be difficult to parametrically model these effects, even with polynomial regression . As an alternative, we can specify such effects as functional data :
Where f(t) is a smooth function over time. Although we do not expand on the approach for this overview, it is possible to allow subjectspecific smooth functions of the form fi(t) . The term εij is the measurement or residual error corresponding to the observation at time tij. It is common to assume that εij follows an independent, identicallydistributed Gaussian distribution, denoted as . Despite this assumption, it is well known that accounting for intra- and intersubject variability is a pervasive issue in longitudinal data analysis. Later, we will utilize the data from Figure 1 to examine alternative covariance structures for continuous longitudinal data.
Depending upon the nature of the data, there are a variety of spline basis expansions that could be used to represent the function in Equation (1). Many applications use the pth-degree truncated power basis , which corresponds to p-splines of the form:
The polynomial terms, β0=β1,…βp are the traditional coefficients for the respective intercept, slope and higher-order terms. The summand expression is comprised of the basic functions b1,…bk, joined by knots k1,…kk, and evaluated at time t. Numerous algorithms are available to aid in selecting the number and location of knots, ranging from automated approaches  to adaptive algorithms . Ngo and Wand  describe the implementation of penalized-spline smoothing in mixed models, thereby providing an automated approach to selecting the degree of smoothing. This exploratory step, often known as scatterplot smoothing, typically occurs before fitting models like those in Equation (1).
Linear mixed models (LMMs) provide a framework for estimating both parametric fixed effects and random effects  and have been a mainstay in longitudinal data analysis . In the LMM setting, the penalized splines from Equation (2) are regarded as best linear unbiased predictions (BLUPs). The LMM including p-splines can be expressed in matrix form as
If we restrict our attention to modeling time effects, the design matrix X with corresponding fixed-effects vector β represent the polynomial terms in Equation (2); zb and b correspond to the matrix for the pth-degree spline basis functions and coefficients in the summand from Equation (2). Typically, the spline basis functions are assumed to follow a normal distribution such that . The remaining terms are used to specify the more conventional random effects in the LMM and error structure. For example, zu could be the design matrix corresponding to subject-specific random intercepts of the form with measurement error . This would correspond to an error structure with components
Maringwa et al.  developed semi parametric mixed models to evaluate differences between group-specific curves in a longitudinal study of cardiovascular functioning. The models were applied to QT prolongation data for the purpose of assessing differences between intervention and control groups. Using the LMM framework, they were able to specify semi parametric mixed models that correspond to various scenarios with differences between group-specific curves. Their formulation utilized the linear truncated power basis, which corresponds to p=1 in Equation (2). The group specific differences predicated from the null hypothesis, , versus the general alternative hypothesis, . The so-called common model was a semi parametric mixed model with no group effect, which corresponds to Equation (1). Then a series of stepwise models were proposed, starting with intercept-only differences , then adding slope-specific differences , spline-coefficient differences (also ), and finally differences in the degree of smoothing inherent in the variance components. Once a model was selected, they employed simultaneous confidence bands  to assess differences between groups over the continuum of time. This approach has been used to assess differences in glycemic control over time for clinically distinct groups of pregnant mothers .
These same authors developed and applied a semi parametric mixed model to QT prolongation data arising from a crossover design, in order to compare condition-specific mean profiles over the continuum of time . Correlations between and within periods were assumed to be separable . In this context, between-period measurements were accounted for via subject-specific random intercepts while different correlation structures were considered for within-period measurements. In another study under review, we examined these assumptions and other potential covariance models in nested repeated measures experiments.
More recent work utilized semi parametric mixed models to characterize the timing and degree of rapid lung function decline in individuals with cystic fibrosis lung disease . In this model of longitudinal disease progression, additional considerations were made to depict the serial correlation inherent in measurements of lung function and assess the rate of change in lung function decline (known clinically as the degree and timing of rapid decline). The authors used three separate, presumably independent effects to model stochastic variation about the mean response function.
In the DBP example at hand, assuming no group effect, the expression corresponds to the following:
First, the expression in the summand represents the covariate effects and parameters θ1,…,θM; these include age (in years), z-score for the body mass index (denoted BMIZ), gender and race (dichotomized as white/not white); ui is a random intercept for the ith subject, which characterizes how the overall mean response function varies from subject to subject, following a normal distribution with mean 0 and variance , denoted as . The term represents the within-subject variation, which is comprised of an exponential correlation function and measurement error:
Here, the exponential correlation function is represented by a random variable assumed to follow a stationary Gaussian process with mean 0 and variance and exponential correlation function . In longitudinal models of DBP, this correlation function, , is where τ describes the rate of decay for the correlation function for time between observations of . This structure assumes that observations farther apart in time are less correlated. The distribution may be generally expressed as , a multivariate normal density with mean 0 and variance-covariance matrix Σ . The remaining term ωij is the “white noise” component, where.
Fitting semi parametric mixed models in the LMM framework may be done using maximum likelihood (ML) or restricted maximum likelihood (REML) for parameter estimation. Selecting the appropriate method often depends on how the hypotheses are formulated in terms of the research question and application of interest. For example, parameter estimates obtained from a series of semi parametric mixed models with different structures for the fixed effects cannot be appropriately compared via the conventional LRT using the Chi-square test statistics across models if REML estimation is employed . As Maringwa et al. point out, REML estimation may be used by replacing the Chi-square test statistic with the corresponding F test statistic.
Although REML is more commonly used than ML in smoothing via the LMM  and is particularly well suited for variance component estimation , results with REML have been shown to vary according to the number of knots . Semiparametric mixed models can also be estimated using nonparametric Bayesian analysis, where all parameters are treated as random. Implementation of this approach has already been provided for instances where the number and location of knots are specified a priori .
A likelihood ratio test (LRT) with the appropriate Chi-square distribution may be used to compare each developed model to the common model , because the inference focuses on the fixed effects only. In the case of the spline-coefficient differences in the existing literature [19-21,23], these coefficients are treated in the LMM as random effects; however, rejecting this test cannot be used to conclude that , for at least two corresponding elements and .
In the example at hand, the group specific differences are tested by having the null hypothesis correspond to , versus each alternative hypothesis corresponding to . We will evaluate a class of semi parametric mixed models (Table 1), in order to assess group differences in mean response functions over time for the OSA group and healthy controls.
|Effects Description||Mean Response Structure*|
|(1.1) No group difference|
|(1.2) Difference between groups constant across 24-hour sequence|
|(1.3) Group-specific mean response functions have different quadratic trends|
|(1.4) Group-specific mean response functions smoothed differently using distinct vectors of coefficients for OSA and Control profiles|
|(1.5) Separate smoothing and distinct smoothing parameters for OSA and Control profiles||Same as structure (1.4) but differing variances for smoothing coefficients: and|
The term OSAi indicates whether subject belongs to the OSA or Control group (1=OSA, 0=Control); similarly, CTRi indicates group membership for spline-based differences (1=Control, 0=otherwise).
Table 1: Class of semiparametric mixed models to evaluate group-specific differences in twenty-four hour diastolic blood pressure.
Model (1.1) corresponds to no difference between OSA and Control groups. The successive models correspond to various types oftemporal differences between the two groups. Model (1.2) indicates the group differences are parallel over time. Model (1.3) provides separateintercept, linear and quadratic polynomial terms for the differencebetween groups, which suggests the group difference follows a globalquadratic trend over time. Model (1.4) allows for more subtle groupdifferences by having different sets of spline coefficients for OSA andControl groups. Finally, Model (1.5) adds separate smoothing anddistinct smoothing parameters in the form of different variances forthe spline coefficients ( and ).
We applied each model in Table 1 to the DBP example data, excluding covariate effects but accounting for the longitudinal correlation with subject-specific random intercepts. Code for fitting the models and the subsequent calculations is available from the authors upon request. The number and location of knots were selected using the quantile-based approach described by Ngo and Wand . Knots used across the 24 h interval included κ =(3.95, 7.20, 9.77, 11.98, 13.98, 16.03, 18.10, 20.01, 22.05). The fitted values from Models (1.1)-(1.4) are presented in Figure 2 and show how the different specifications can produce a variety of trends for the mean DBP response function over time. All four models suggest the presence of nocturnal dipping, which indicates that DBP drops during sleep; however, Models (1.2)-(1.4) have different levels of emphasis on how distinct the nocturnal dipping patterns are between Control and OSA groups. Each of these three models suggests there may be periods across the 24 h interval in which the mean DBP response function for the OSA group is slightly elevated, compared to the Control group. We investigate these differences later with the use of confidence bands for temporal differences between the two group functions.
Figure 2: Hypothesized differences between groups for semiparametric mixed models applied to the diastolic blood pressure (DBP) data. Beginning clockwise from the upper left panel, smooth function obtained from model assuming a common profile (Model 1.1); smooth functions with parallel group differences over time (Model 1.2); distinct quadratic trends (Model 1.3); different smoothing on functions via distinct spline coefficient vectors for the groups (Model 1.4).
In addition to the LRT, an adjusted Akaike’s information criterion (AIC) for model selection has been proposed . This term is added to the marginal AIC  such that incorporating smoothing into the model is penalized. Model fit statistics are presented in Table 2. Adjusted AIC shows that Model (1.3) had the smallest value, implying that, out of the models evaluated, group differences over time are best characterized by having separate intercept, slope and polynomial terms for OSA and Control groups. As shown, the penalty term Ep increased as complexity of the spline formulation increased, but dropped off in Model (1.5), which specified different spline variance components for OSA and Control groups. Model fit statistics were similar (on the same order of magnitude) for Models (1.1)-(1.4); however, Model (1.5) had fit statistics that were higher by an order of magnitude than the other models. Estimates of and in Model (1.5) were close to the zero boundary. Although our presented results are based on ML estimates, the results with REML estimates yielded similar conclusions across all models.
*Results were obtained prior to covariance model selection and assume subjectspecific random intercepts. Demographic and clinical characteristics included as covariates.
Table 2: Semiparametric mixed model selection applied to twenty-four hour diastolic blood pressure.
We proceed to covariance model selection using Model (1.3) as the basis for all effects excluding the error term, . For our illustration, we examined three different covariance models for the DBP data (Table 3). In the first formulation with random intercepts, we assume , where , , and we assume for all i and j. Second, we consider the covariance structure defined in Equation (5), whichincludes the exponential correlation function. Finally, we examine acovariance structure where the exponential correlation in Equation (5)is changed to Gaussian correlation. Of those three covariance modelsexamined, the exponential covariance model was found to have thesmallest adjusted AIC.
*Results were obtained for Model (1.3), which assumes mean response functions for groups differ by quadratic trend. Demographic and clinical characteristics included as covariates.
Table 3: Covariance model selection for the semiparametric mixed model (1.3) applied to twenty-four hour diastolic blood pressure.
We utilized Model (1.3) with exponential covariance to conduct a more precise examination of the temporal differences in OSA and Control mean response functions. To avoid well-known issues with multiplicity, we do not test differences between groups across several time points within the 24 h interval. Instead, we augment existing results [8,19] to construct simultaneous confidence bands of differences between fOSA(t) and fCTR(t) under the aforementioned semi parametric mixed model and its covariance formulation ;
Where C=[X Z] and We define a grid g of time points (0, 24) by increments of 0.5 hours such that there are T=49 equally spaced time points . We evaluate the estimated difference between the two functions as
Where is a contrast matrix; X and Zb are the design matrices evaluated over g; Cg; [LX LZb].
In order to obtain the for the confidence band, it is necessary to compute as
then we have . We obtain a 95% simultaneous confidence band for fd as:
Where h0.95 is the 1-α quantile with α=0.05 of the random variable:
The quantile h0.95 can be approximated using simulations. As an example, if simulations from  then computations of  are repeated 10,000 times, the value of the ranked 9,500th quantity is used as h0.95. Similarly, a 95% pointwise confidence band for fd is
Simultaneous confidence bands for differences between groups in our DBP application are in Figure 3. The point wise bands were similar to displayed simultaneous bands but were expectedly narrower. Although results do not approach statistical significance, they do suggest that mean DBP response for the OSA group may be slightly elevated, compared to the Control group, during the sleep. The confidence bands become markedly wider as the 24 h interval proceeds, demonstrating the increased variability in the mean response during daytime periods.
The derivatives and taken from the functional forms in Model (1.3) can be used to examine the morning surge separately for the OSA and Control groups. By looking at time since sleep onset, which corresponds to 0 on the x-axis in Figure 4, the rate of change in the mean DBP response is slightly higher for the Control group; however, the derivative curves begin to overlap around t=9 hours after sleep onset. Presumably, 7-9 h after sleep onset is the interval of interest to assess the morning surge, as this is the time frame when subjects begin to wake. Results suggest the rate of change is slightly elevated in the Control group, as compared to the OSA group. From a biomedical perspective, this finding may indicate that children without OSA have a somewhat dampened response to wakefulness. Despite the clinical implications, these results are not statistically significant. The 95% simultaneous confidence bands are not shown but could be derived similarly to those we obtained for in Figure 3.
In this paper, we have shown that semiparametric mixed modeling within the LMM framework is a flexible method for modeling data arising from medical monitoring. We have discussed several inferential tools related to spline structure and covariance model selection. Even though we have tried to review the most important aspects of semiparametric mixed models applied to medical monitoring data, there are inevitably a number of topics and extensions that have not been touched upon.
While the ease of implementation makes fitting semiparametric mixed models appealing for medical monitoring and other continuous longitudinal data sources, there still exist several non-trivial issues. For example, although classes of semiparametric mixed models have been developed and an optimal model can be selected using established information criteria, the traditional LRT may be somewhat underpowered for this endeavor. Its poor performance has been demonstrated for testing interactions involving fixed effects .
We explored the model selection strategy that has been proposed for semiparametric mixed model selection by conducting a small simulation study and evaluating models on the basis of the adjusted AIC. Data were generated by setting fixed effects values equal to estimates obtained in Model (1.5). The error variance was set to the model estimate. The spline coefficients were generated separately by group according to variance values from the Uniform distribution (min: 0, max: 1), we performed 1,000 simulations under these settings and found that Model (1.3) was always selected as the optimal model based on fit statistics. We slightly modified these results to examine behavior of the model selection process when variance components for the spline coefficients, which enter into Model (1.5), had true values near the zero boundaries. As it turns out, incorporating these differences still led to Model (1.3) being most favorable of the models considered. Our preliminary findings suggest that more work is needed to develop a more sensitive indicator of model fit when a class of semiparametric mixed models is evaluated and also when changes in degree of smoothing are truly evident.
As mentioned previously, another challenge in semiparametric mixed model development is selecting the location and number of knots. Early work found that specifying the number of knots and their locations a priori works well for most data applications . Adaptive methods have not been pursued to a large extent in semiparametric mixed modeling due to their computational intensity, but knot selection algorithms such as Bayesian adaptive regression splines (BARS) have been used in biological studies with continuous longitudinal data  and extended for more general functional data applications .
Semiparametric mixed models have clinical utility, as shown in the studies referenced in our overview, but distinctions need to be made between medical monitoring and other settings. For example, if the models are being developed for prognostic purposes but are based on a clinical patient registry, then statistical causality should be incorporated. This could be achieved by including latent variables to characterize the causal structure of the model . Another aspect to advancing the use of semiparametric mixed models in clinical settings would be to improve their real-time assessment. Joint modeling has been proposed as an approach to improve real-time prediction. Recent work involves the use of Gaussian process regression to achieve improvements , which eliminates the need for knot selection and spline development. Real-time semiparametric regression was recently proposed by Luts et al.  and includes feasible algorithms for prognostic assessments. There are several expansions that could be made to this methodology, in order to account for changing trends in clinical care while providing flexible models that offer increased clinical utility.
Authors received partial support for this work from the NIH/NCRR (Grant: 5UL1RR026314-03) and NIH/NHLBI (Grants: R01HL080670; M01RR008084). The contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH.