Mieczyslaw Szyszkowicz^{*} and Wesley S Burr  
Population Studies Division, Health Canada, Ottawa, ON, Canada  
Corresponding Author :  Mieczyslaw Szyszkowicz Population Studies Division, Health Canada 200 Eglantine Driveway, Ottawa, ON, K1A 0K9, Canada Tel: (613) 9484629 Fax: (613) 9543768 Email: [email protected] 

Received February 07, 2014; Accepted June 12, 2014; Published June 23, 2014  
Citation: Mieczyslaw S, Wesley SB (2014) Distributed Lag Models: An Analysis of Milan Mortality Data. J Pollut Eff Cont 2:109. doi: 10.4172/23754397.1000109  
Copyright: © 2014 Szyszkowicz M, et al. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.  
Related article at Pubmed Scholar Google 
Visit for more related articles at Journal of Pollution Effects & Control
The objective of this study is to present a new variation on Distributed Lag NonLinear Models (DLNMs) for assessing associations between counts of health events and exposure to ambient air pollution. For illustrative purposes, a wellknown data set for Milan, Italy was considered. Total Suspended Particulate (TSP) concentrations were used as the air pollution measure, and meteorological data were represented by daily mean temperature and relative humidity. Relative risks (RR) were estimated using Poisson Generalized Linear Models. Two controls for long timescale variation were considered: a more traditional cubic regression spline smoother, and the more recent casecrossover (CC) control approach. The mortality displacement effect was estimated using DLNMs for a relatively high number of constructed lags. For the considered lags (0–45 days) and the CC approach, three regions were identified: region A (lags 0–7) with RR=1.021 (95 % confidence interval: 1.009, 1.043; region B (lags 8–27) with RR=0.981 (0.965, 0.997); and region C (lags 28–45) with RR=1.018 (1.003, 1.032). The total cumulative risk (regions A + B + C, lags 0–45) gave RR=1.019 (1.001, 1.037). The results were reported for an interquartile range (IQR=86.5) increase in TSP air pollution and are similar in structure to those previously reported, albeit at a significantly reduced level. We attribute the change to the considerable change in long timescale variation left in the residuals, as the clustering effect controls seasonal effects at a much stronger level.
Abstract 
The objective of this study is to present a new variation on Distributed Lag NonLinear Models (DLNMs) for assessing associations between counts of health events and exposure to ambient air pollution. For illustrative purposes, a wellknown data set for Milan, Italy was considered. Total Suspended Particulate (TSP) concentrations were used as the air pollution measure, and meteorological data were represented by daily mean temperature and relative humidity. Relative risks (RR) were estimated using Poisson Generalized Linear Models. Two controls for long timescale variation were considered: a more traditional cubic regression spline smoother, and the more recent casecrossover (CC) control approach. The mortality displacement effect was estimated using DLNMs for a relatively high number of constructed lags. For the considered lags (0–45 days) and the CC approach, three regions were identified: region A (lags 0–7) with RR=1.021 (95 % confidence interval: 1.009, 1.043; region B (lags 8–27) with RR=0.981 (0.965, 0.997); and region C (lags 28–45) with RR=1.018 (1.003, 1.032). The total cumulative risk (regions A + B + C, lags 0–45) gave RR=1.019 (1.001, 1.037). The results were reported for an interquartile range (IQR=86.5) increase in TSP air pollution and are similar in structure to those previously reported, albeit at a significantly reduced level. We attribute the change to the considerable change in long timescale variation left in the residuals, as the clustering effect controls seasonal effects at a much stronger level. 
Keywords 
Air pollution; Distributed lags; Mortality; Humidity; Temperature; Casecrossover; SAS 
Introduction 
The aim of this paper is to demonstrate a methodological development for distributed lag models using a casecrossover (clustering) approach to control for long timescale variation. As a demonstrative example, we perform an analysis of the Milan, Italy mortality data set [1] using our new statistical approach. As scientific progress on the methodology and techniques related to air pollution exposure and associated health conditions is continuing, it is interesting to consider new techniques for estimation of associations that are widely known and published in the scientific literature. This study uses distributed lag nonlinear models (DLNMs) to effectively represent and quantify associations showing nonlinear and delayed effects in timeseries data. In particular, we apply DLNMs to quantify mortality displacement as in [1]. 
This is a methodological paper using real data as a demonstrative example. As health data we are using mortality counts, and relating them to exposure to ambient particulate matter (PM) in the form of Total Suspended Particulate (TSP). Extensive research has demonstrated the associations between exposure to PM and different health conditions related to respiratory and cardiovascular diseases and mortality [27]. Thus studies in this domain address an important aspect of environmental epidemiology. 
The main purpose of this paper was to compare and contrast two different approaches to the estimation of mortality displacement, both using distributed lag nonlinear models. In the first approach, we use casecrossover clustering to account for long timescale variation, and in the second we use a more traditional natural cubic regression spline smoother. These two models are compared using the Milan data set [1], as a wellknown previously studied example. 
Distributed lag models are a modeling schema for presenting simultaneously both nonlinear and delayed dependencies in timeseries data [8]. The DLNM methodology in particular is an extension of a statistical regression model for defining the relationship between a set of predictors (such as air pollution) and an outcome (such as health conditions). In this methodology the estimated relationships allow for a temporal structure of dependency. This is useful because in environmental epidemiology a specific occurrence of an exposure event often affects the health outcomes for a lapse of time beyond the event moment. 
Materials and Methods 
To demonstrate our developments, we used the wellknown 19801989 Milan, Italy mortality database [1]. The considered data were retrieved from the package SemiPar [9], where they are included for illustrative purposes. In our analysis we considered the timeseries data of the form: 
(x_{t},y_{t})=(mortality, airpollution)_{t}, t=1,2,…,3652 
i.e., daily mortality and daily mean air pollution for 3,652 consecutive days. We assumed that the data used are close to their original representation, i.e., to the data described in the published work on mortality displacement, an assumption backed by our Table 1 and the similar Table 1 in [1] . The mortality data are daily counts derived from death certificates and are restricted to residents of Milan who died from natural causes (International Classification of Disease revision 9, cases 1799). As was the case for the authors of the publication containing the original data, we also used Total Suspended Particulates (TSP) as our ambient air pollution measure. The imputed (19% originally missing) version of the TSP data was provided by the SemiPar package, and thus used. Meteorological data were expressed as daily mean temperature and relative humidity. In the considered data set from SemiPar, there were no missing values for any of the considered data variables. 
We analyzed the Milan mortality data using a Poisson Generalized Linear Model (GLM) [10] framework, with computation and estimation done in the R programming language [11] using the DLNM package [8,12]. The script used in R is presented in the Appendix 1. As our data are timeseries, we applied two approaches to adjust for timebased variation (across 3652 days). In both approaches the model used was matched to that of [1] as closely as possible. Thus, each model contains a DLNM function of TSP, as well as spline functions of temperature and relative humidity (with 4 df each) and indicator variables for high temperatures, as in [1]. The remaining terms deal with the dayofweek effect and long timescale variation, and it is here that the approaches diverge. 
For our first approach, we applied a casecrossover clustering approach, and grouped the data by 14day periods, with each group considered as an individual cluster with two measurements for the same day of the week ([13,14] for details on why this is a sensible choice). For verification purposes, we repeated our analysis and defined clusters by 21day periods instead. In the latter situation we had 3 data points for each day of week. We did this twice by grouping days into clusters in descending (from 1 to 3,652) and ascending (from 3,652 to 1) temporal order. In our second approach, we used a natural cubic regression spline smoother to control for time with 33 degrees of freedom (df), i.e., the effective df used in [1]. This approach is widely used in environmental epidemiology, although traditionally more df are used. This lower df choice was made in an attempt to match the model of [1] as closely as possible, although we used a fixeddf spline smoother rather than the penalized smoother used there. In the second approach we also included a dayofweek term, which was not necessary in the clustering approach. 
We considered a high number of possible lags (045 days) for TSP, the same number chosen by Zanobetti et al. in their initial analysis [1]. The full specifications of the DLNM + GLM model used are presented in the R script in the Appendix 1. 
Results 
The results are organized and presented in three figures and two tables. Table 1 presents summary statistics on mortality and environmental characteristics (TSP, temperature, and relative humidity). The table appears to be nearly identical to a similar summary provided by Zanobetti et al. [1], (Table 1). The results for an increase in one InterQuartile Range (IQR) of TSP (86.9 for this dataset) are presented in Table 2 as relative risks (RRs) with their corresponding 95% Confidence Intervals (95% CI). The results are shown for three regions (lag ranges) labeled as A, B, and C. The regions were determined on the basis of the values of individual relative risks as in [1]. The table shows both the results using the regions A, B and C as determined in [1], i.e., A being lags 015, B being lags 1620, and C being lags 2145, and also for regions as determined by our first, new, approach, using the casecrossover clustering control for time, with the determination made in the same way: positive RRs (A, lags 07), negative RRs (B, lags 827), and positive RRs (C, lags 2845). 
The results shown in Table 2 were obtained using the code from Appendix A, estimating a GLM for each model described above, with our two approaches for controlling long timescale variation The results for the smooth function of time model are not identical to those previously estimated in [1], but they do share common structure: positive and significant for regions A, C and the total cumulative effect, and not significant for region B. This holds for both the region classification scheme for [1] and the classification obtained from our clustering timecontrol approach. The total cumulative RR=1.074 (95% CI: 1.035, 1.113) is higher than that obtained in [1], but we have replaced the penalized spline smoothers with fixeddf smoothers, so some variation between the two is expected. For our new approach, GLMs are estimated with twoweek clusters acting in place of the smooth function of time. Each cluster contains two of each of the days of week (see R code in Appendix 1, and [13,14] for details on choosing twoweeks versus three or more). Under regions defined as in [1], regions A, RR=1.012 (95% CI: 0.998, 1.026), and C, RR=1.014 (95% CI: 0.998, 1.030) cease to be positively significant, while region B becomes negative and significant, RR 0.993 (95% CI: 0.988, 0.999). When restricted by its own positive/negative relative risk status, chosen as in [1], the results return closer to the state previously observed, with regions A, RR=1.021 (95% CI: 1.009, 1.033), and C, RR=1.018 (95% CI: 1.003, 1.032) being positively significant. However, region B remains negatively significant, RR=0.981 (95% CI: 0.965, 0.998). The total cumulative risk is positively significant, albeit much smaller than observed using a smooth function of time, with RR=1.019 (95% CI: 1.001, 1.037). 
Figure 1 shows the estimated RRs for mortality (with 95% CIs along the 45 lags) for an increase in concentration of TSP of one IQR, showing both the twoweek clustering approach (solid line) and the spline function of time approach (dashed line). The behavior follows that of [1], (Figure 2), with the modifications discussed above. The differences between the two are discussed further below. Figure 2 shows the cumulative effects of the same unit exposure. In Figure 1 we marked the regions A, B, and C, where the RRs have the same sign (positive or negative). In addition, we used vertical lines to emphasize their boundaries. 
Discussion and Conclusions 
In this work we analyzed mortality displacement with the goal of estimating the effect of air pollution, adjusting for this mechanism using the DLNM technique. We used data from the public domain (provided by the second author of [1]) and although the set used appears to be not quite identical to that used by Zanobetti et al. [1], it is close enough for the results to be comparable. Our results are quite different for the case where we used casecrossover (clustering) models with clusters composed of 14 and 21 days. In these models "time" is managed by cluster structures with two and three points for each day of the week, respectively. The overall effect (A+B+C; RR=1.019, 95% CI: 01.001, 1.037) is positive and statistically significant, but significantly lower than [1], which gave RR=1.067 (1.038, 1.096). We obtained negative and statistically significant results for region B under both divisions; that of [1] and the same algorithm applied to our approach. The regions are quite different between the two, with our approach having a significantly larger number of lags with negative risk. This can partially be attributed to the overall reduction in estimated risk (shifting the entire curve downward) and partially to a shape change which we suspect to be datadriven. 
In our second (baseline) approach, we used a more traditional GLM with a cubic regression spline controlling for long timescale variation (using 33 df to match [1]). We obtained somewhat similar results to those published previously (Table 2) [1], albeit with increased levels. As the models used were not identical (we used fixeddf splines versus the penalized option), and as the authors of [1] did not use the slightly more flexible DLNM framework (as it had not been developed), the variation between the results is understandable. The relative risks for the second approach remain comparable to that of [1] despite the difference in implementation, allowing us to compare our two implemented models more directly (and thus, by proxy, to the previously published results). As our second approach is nearly that of [1], by dropping the smooth function of time and replacing the mortality counts by their clusters instead (our first approach), we obtain different results which can be attributed to the change in time control. 
In this study, positive and statistically significant shortterm effects on daily mortality in Milan, Italy were observed in relation to exposures to ambient air pollution. The DLNM technique allows us to effectively quantify the effects of delayed exposure over time. The goal of this work was to highlight a new technique (casecrossover time control within a DLNM framework) and to use it in the domain of air pollution and related health effects. In addition, we provided a SAS procedure (PHREG) which implements the casecrossover technique (Appendix 2). As was recently observed, some realizations of this methodology may generate bias for different options in the procedure for ties [15]. In the presented code each case and its controls are considered as separate strata. As was investigated in the work [15] such approach generates proper results for all options. 
As the new casecrossover time control approach provides reduced results to those using a more traditional time smoother, an open question for future research is what effect causes the reduction. As the total cumulative effect results in [1] (and those provided by our second approach with a 33 df time smoother) are more positive than those of our first (CC) approach, our current hypothesis is that whatever portion of the time variation is controlled for by the CC approach (essentially, removed from consideration) for the Milan data is more strongly related than the residuals. The approximate 33 df used for the time smoother in [1] fails to control for seasons in any significant way [16], so we suspect that a seasonal association between TSP and mortality may be responsible for the difference. Note that this decrease in risk is not generalizable, as it will depend on the particular city of interest and its geography and climate–some cities may well have an increase in total cumulative effect when seasonality is controlled more completely. 
References 

Table 1  Table 2 
Figure 1  Figure 2 