A Comparison of Generalized Additive Models to Other Common Modeling Strategies for Continuous Covariates: Implications for Risk Adjustment

When evaluating the effect of an exposure using observational data, adjustment for potential confounders is usually inevitable and is generally performed using statistical modeling. Risk adjustment models often include quantitative variables that may not have a linear relationship with the independent variable. As such, if they are modelled without transformation, the postulate of linearity, essential for parametric modeling strategies, is violated leading to bias and coverage problems [1]. If these variables are of primary interest, violating the postulate of linearity can lead to effect estimates that are too close to the null [2]. If these variables are used to control confounding, violation of the postulate of linearity can lead to residual confounding. Consequent influence on the measure of exposure effect is difficult to predict.


Introduction
When evaluating the effect of an exposure using observational data, adjustment for potential confounders is usually inevitable and is generally performed using statistical modeling. Risk adjustment models often include quantitative variables that may not have a linear relationship with the independent variable. As such, if they are modelled without transformation, the postulate of linearity, essential for parametric modeling strategies, is violated leading to bias and coverage problems [1]. If these variables are of primary interest, violating the postulate of linearity can lead to effect estimates that are too close to the null [2]. If these variables are used to control confounding, violation of the postulate of linearity can lead to residual confounding. Consequent influence on the measure of exposure effect is difficult to predict.
The literature abounds with advice on how to model quantitative covariates [3][4][5][6]. Some have suggested that dummy variables on a minimum of five categories are sufficient [7]. Others have suggested that parametric transformation with fractional polynomials (FP) [8] or non-parametric functions in Generalized Additive Models (GAM) [9] can improve control of confounding. Most studies that have evaluated the impact of modeling strategy on risk adjustment have been based on simulated data and limited to null associations between exposure and disease [6,10,11]. Therefore, while GAM theoretically offer the most flexible modeling strategy, [12] there is a lack of evidence based on real data that they lead to better control of confounding.
The objective of this study was to evaluate the impact of using GAM over other commonly used covariate modeling strategies on risk adjustment.

Study setting
The study was based on the comparison of adjusted hospital mortality estimates across trauma centers in the province of Quebec, Canada. The context of trauma provides a good test bed for the evaluation of risk adjustment as there is great heterogeneity in the baseline risk of injured patients across centers, most of the indicators used to describe baseline risk are quantitative in nature, and the relationship between many risk factor indicators and mortality is non-linear. We were primarily interested in how parameter estimates describing the log odds of mortality in one trauma center compared to another trauma center, adjusted for quantitative risk factors, would vary according to the modeling strategy applied to risk factors. Of secondary interest was the effect of modeling strategy on statistical significance.

Study data
The study sample was drawn from the Quebec Trauma Registry, which contains information on all trauma patients admitted to any of the 59 trauma centres in the province of Quebec, Canada according to the following uniform inclusion criteria: death, intensive care unit admission, hospital length of stay of more than two days, or transfer from another hospital. Data is extracted from patient files in each trauma center, according to standardized coding protocols, and then centralized at the Quebec Ministry of Health where it is screened and *Corresponding author: Lynne Moore, Unité de traumatologie -urgence -soins intensifs, Centre de recherche du CHA (Hôpital de l'Enfant-Jésus), 1401, 18eme rue, Quebec City, Quebec, Canada, G1J 1Z4, Tel: 01-418-649-0252; Fax: 01-418-649-5733; E-mail: lynne.moore.cha@ssss.gouv.qc.ca added to the Quebec Trauma registry. Periodic validation, supervision by a data coordinator, yearly on-going training for data coders, an electronic forum of coding queries, and thrice-yearly meetings with key stakeholders (e.g. trauma physicians, researchers, administrators), all help improve data quality. Trauma centers with less than 100 patients (n=7) were excluded from analyses.

Statistical analysis
The outcome variable of interest was hospital mortality. Risk adjustment was based on five variables commonly used for case-mix control in trauma research: 1) age; 2) the Injury Severity Score, an ordinal variable varying between 1 and 75 that measures increasing anatomical injury severity [13]; 3) the Glasgow Coma Score, an ordinal variable varying between 3 and 15 that measures increasing level of consciousness [14]; 4) respiratory rate, an interval-scale variable varying from 0 to around 100; and 5) systolic blood pressure, an interval-scale variable varying from 0 to around 300. The Injury Severity Score is based on clinical evaluations performed throughout the hospital stay whereas the Glasgow Coma Score, respiratory rate, and systolic blood pressure are measured on arrival at the emergency department of the trauma center.
Risk adjustment variables were entered in a logistic regression model using each of the four modeling strategies: 1) a single linear term, 2) dummy variables on either 5, 4, 3, or 2 categories, 3) FP, and 4) GAM. Categories were based on clinically plausible cut-points that are widely used in the literature. However, alternative cut-points were evaluated in sensitivity analyses.

Fractional polynomials
Introduced by Royston and Altman in 1994 [14], fractional polynomials (FP) are an extension of traditional quadratic and cubic polynomials whereby a wider range of powers is used. Royston suggests that the following group of powers gives all the flexibility needed: -2, -1, -0.5, 0, 0.5, 1, 2, and 3, where 0 represents the logarithm. Models can be first degree (with only one polynomial term) or can include more terms. According to Royston, two terms (second degree model) offer sufficient flexibility in most situations. Optimal FP were based on minimizing model deviance using STATA software (Stata Corporation, College Station, TX, Version 7.0). Clinical plausibility was verified by inspecting curves of the functional relationship between risk factors and the risk of mortality.

Generalized additive models
Generalized Additive Models are an extension of Generalized Linear Models that are based on the specification of a parametric link function (the logit is used for logistic regression) and allow independent variables to be modeled with parametric or non parametric functions. Non-parametric functions (cubic splines are used in this study) allow the expression of relations to mortality that are non-linear. While the location of knots in a cubic smoothing spline has little influence on results, the number of knots (equivalent to the smoothing parameter usually expressed as the number of degrees of freedom) is important. Too many degrees of freedom can lead to overfitting (describing random error rather than a true underlying relationship) whereas choosing too few can lead the analyst to miss important trends (the linear model is an extreme example of this with a smoothing parameter equivalent to 1 degree of freedom). Unfortunately, the algorithms available for choosing the optimal smoothing parameter (e.g. maximum of the generalizedcross validation function) are not yet well developed and can generate misleading results. For this reason, a fixed smoothing parameter equivalent to four degrees of freedom was used for each variable. Data simulations performed by Hastie and Tibshirani have demonstrated that four degrees of freedom is suitable in most circumstances [12]. However, other smoothing parameters were evaluated in sensitivity analyses. Again, clinical plausibility of functional relationships was verified graphically. GAM were fit using the PROC GAM procedure in the Statistical Analysis System (SAS Institute, Cary, NC, version 9.2).

Model fit
The model fit offered by each of the four modeling strategies was evaluated by fitting the logistic models described in Table 1.
The fit of each model was described using Akaike's Information Criteria (AIC). The AIC measures goodness of fit and is calculated as a function of the maximum of the likelihood function and a correction for the number of parameters in the model. The discrimination of each model was also evaluated described by the Area Under the receiver Operating Characteristic curve (AUC). The AUC varies between 0 and 1 where a model with an AUC=1 discriminates deaths from survivors perfectly and a model with an AUC=0.5 discriminates no better than chance alone. AUC were compared using the non-parametric method for dependant samples described by Delong et al. [15]. Finally, curves of observed and predicted mortality were compared for univariate models. Observed mortality proportions were calculated for as many small intervals of quantitative variables as possible based on the constraint of at least 10 events per interval.
Statistical analyses were performed using the Statistical Analysis System (SAS Institute, Cary, NC, version 9.2). The study was approved by the Research Ethics Committee of the 'Centre Hospitalier Affilié Universitaire de Québec' and the 'Commission d' Accès à l'Information du Québec' .

Results
The study population comprised 123,027 patients of whom 7600 (6.2%) died in hospital. SBP had a Gaussian distribution but all other quantitative variables had asymmetric distributions. There was an important variation in the distribution of risk factors across trauma centers, illustrating the need to adjust for baseline risk in hospital comparisons: Mean age varied between 8.0 (95% CI: 7.9-8. All non-linear terms for GAM had p-values <0.001, demonstrating that the association between each quantitative variable and hospital mortality had a significant non-linear component. When compared to the observed risk of mortality, single linear terms led to an underestimation of risk for patients aged over 80 years (Figure 1a), an overestimation of risk for ISS>40 (Figure 1b), an underestimation of risk for GCS=3, and an overestimation of risk for GCS=4 to 10 ( Figure  1c). In addition, single linear terms led to an overestimation of risk for RR>20 (Figure 1d) and an underestimation of risk for SBP<50 and SBP>150 (Figure 1e). Dummy variables on categories led to an underestimation of risk for patients >85 years of age (Figure 1a), an underestimation for ISS>50 (Figure 1b), and they did not pick up the increase in risk for SBP>150 (Figure 1e). The functional relationships between all risk factors and mortality risk described by GAM and FP were very close to observed values. However, for the ISS, RR, and SBP, GAM and FP functions diverged for extreme values where data was sparse (Figure 1b, 1d, and 1e).
The discrimination and fit of models with a single linear term were inferior to that of the GAM ( Table 2). The discrimination and model fit of models based on categories decreased with the number of categories used. Dummy variables on less than 4 categories were not only inferior to the GAM but were also inferior to single linear terms. Discrimination and model fit using FP were equivalent to that of the GAM.
Mean standardized difference in parameter estimates from the GAM model were over 40% when single linear terms or dummy variables on 2 or 3 categories were used but under 40% for FP and dummy variables on 4 or 5 categories (Table 3). Similarly, single linear terms or dummy variables on 2, 3, or 4 categories led to a change in statistical significance over GAM in more than 10% of estimates.

Sensitivity analyses
Generalized cross validation of deviance functions could not identify 'optimal' smoothing parameters for the GAM model. We therefore increased df by two to a maximum of 20 (or not greater than the number of levels /2) for each variable. Model fit and discrimination   for the model with maximum df for each variable was similar to that obtained with a fixed smoothing parameter equivalent to 4 degrees of freedom (AIC=34,600; AUC=0.901). When cut-points were changed to reflect alternative clinically plausible categories (ex. GCS<13; ISS>12; age>70), results for models for dummy variables were similar.

Discussions
The results of this study suggest that the method used for modelling quantitative covariates in a logistic regression model does have an influence on risk adjustment. Of the four modelling strategies used, FP and GAM led to significantly better model fit and discrimination than single linear terms or categories. However, FP, GAM, and dummy variables on at least four categories led to similar parameter estimates. Conversely, single linear terms and less than four categories led to poorer model fit, significantly worse discrimination than GAM, and parameter estimates with a mean standardized difference of more than 40%. If we assume that among the modeling strategies tested, GAM predict outcome most accurately, an assumption supported by the better fit of the GAM model to observed data, results suggest that FP, GAM, or dummy variables on at least 4 categories may offer superior risk adjustment to the other modelling strategies tested.
The potential for residual confounding when quantitative covariates are modeled with single linear terms in spite of non-linear relations to response has been widely documented [6,7,17]. In addition, several studies have shown evidence that using dummy variables on categories for risk adjustment can lead to residual confounding [10,18]. In the 1960s, Cochrane studied the effect of crude versus fine categorization of confounding variables on exposure effect estimates and found that bias decreased as the number of categories used increased [18]. Royston showed that FP fit data better than single linear terms and dummy variables on categories [8]. More recent literature has suggested that GAM can improve risk adjustment compared to single linear terms or parametric transformations of linear terms [9]. In a study comparing spline regression to dummy variables on five categories, little difference in exposure OR effect estimates was observed [17]. The results of this study therefore confirm previous work and go further by suggesting that a) FP and GAM provide similar model fit and risk adjustment and b) although categories do not fit data as well, using at least four may provide equivalent risk adjustment to FP and GAM.
While dummy variables on categories, FP, and GAM are useful strategies for modelling non-linear associations to outcome, each method has its disadvantages. As shown in our data, dummy variables on categories are sensitive to the number of categories used. In addition, the choice of cut-points is crucial; multiple risk-homogeneous categories are ideal but may lead to categories with low sample sizes resulting in unstable regression coefficients. Furthermore, dummy variables lead to step functions that are clinically implausible and do not use all the information in the quantitative variable, which can lead to loss of statistical power [5].
FP and GAM allow for smooth non-linear relations to the dependant variable and model quantitative variables over the whole range of values. However, they are mathematically complex and both require careful implementation. In the case of FP, the correct choice of powers is critical and relying solely on empirical model fit (deviance statistics are used in STATA) may lead to clinically implausible associations. In the case of cubic spline functions used in GAM, the     [12]. Identification of the minimum of such functions is not simple as they may have no or multiple minima. Furthermore, use of such criteria can often lead to over-fitting and clinical implausible associations. The choice of power functions for FP and the choice of smoothing parameters for smoothing splines in GAM should therefore always be accompanied by a graphical verification of functional associations with outcome to verify clinical plausibility. Splines can also have unpredictable behaviour at extremes of variable ranges where data is sparse. This will have little influence on confounding control as estimates are very imprecise in these areas but can be disarming when presenting functional relationships graphically. Accompanying fitted functions with confidence bounds will convey information on precision. Like a single linear term, FPs and smoothing splines are sensitive to extreme observations. Therefore, performing influence analysis and checking residuals is important when using these methods. Finally, due to the inherent complexity of non-parametric functions, GAM with smoothing splines cannot be written explicitly and are therefore difficult to reproduce. Models based on FP may therefore represent the most advantageous modeling strategy as they are comparatively intuitive and can be fully expressed in written form.

Strengths and limitations
To date, studies that have compared the impact of different modeling strategies on model fit or confounding control been based on single group comparisons [17] or on simulated data that are likely to convey 'unrealistic situations' [5,6,19] and are often limited to quantitative variables with standard normal distributions and null associations between exposure and outcome [6,10,11]. The current study was based four widely-used modeling strategies, on real data, on several covariates with differing probability distributions from normal to extremely skewed, on exposure with varying associations to outcome, and on multiple comparisons of exposure groups.
Certain limitations should be considered however, when interpreting the results. First, the study was based on data collected in one clinical context and only evaluated a dichotomous outcome modelled on a logit scale (logistic model). Results may therefore not generalize to other clinical situations or other model types. However, we believe the use of several risk factors with differing probability distributions, and multiple exposure comparisons add to the external validity of our findings.
Second, other modeling strategies for quantitative independent variables could have been evaluated. For example, higher powers for FPs, different smoothing parameters for GAM, or alternative cut-points for dummy variables could have been used. In addition, nonlinear transformations of quantitative variables could have been evaluated including cusums [20] or negative exponential functions [3]. However, considering the large number of possible combinations, we restricted comparisons to commonly used modeling strategies and cut-points that are widely used in clinical research.

Conclusions
Results of this study indicate that when quantitative confounders have a non-linear association with outcome, cubic smoothing splines in GAM, FP, or dummy variables on at least 4 categories may offer better control for confounding than single linear terms or dummy variables on less than 4 categories, providing they are implemented with care.