Received date: August 07, 2012; Accepted date: August 08, 2012; Published date: August 13, 2012
Citation: Zheng Y, Charnigo R (2012) On Selecting Spatial-Temporal Autologistic Regression Models for Binary Lattice Data. J Biom Biostat 3:e112. doi:10.4172/2155-6180.1000e112
Copyright: © 2012 Zheng Y, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Biometrics & Biostatistics
In many biological and physical sciences, rapid advances in technical capabilities have dramatically increased the amount of data that are collected across space and over time. Spatial-temporal models are important tools for the analysis of spatial data collected repeatedly over time and have been applied to a wide range of problems, including modeling patterns in lung cancer , breast cancer , birth defects , and West Nile virus ; see also Cressie , Rue and Held , and Schabenberger and Gotway . In particular, for binary data that are observed on a spatial lattice over time, spatial-temporal autologistic regression models relate binary responses to covariates while accounting for spatial and temporal dependence simultaneously [8,9].
Let yitdenote the response variable such that yit = 0 or 1 at site i and time t, where i = 1,..., n and t = 1,...,m. Let y1t = ( y1t ,..., ynt ) ' denote the binary responses on the spatial lattice for a given time point t. We specify the joint distribution of y = ( y 's+1 ,..., y 'm ) ' via conditional distributions,
for t = s +1,...,m. Further, for a given time point t, we assume that the response variable follows an autologistic model
Here x jit denotes the jth covariate at site i and time t, are regression coefficients are spatial autoregressive coefficients, are temporal autoregressive coefficients, and for l = 1,..., s are spatial-temporal interactive coefficients. For a given site i, we can partition the neighborhood For example, in the bark beetle infestation example of Zhu et al. , the study region is a regular square grid. Then we can define Nk(i), the kth-order neighbors of a given site i, to contain the k nearest neighbors in terms of distance, for k = 1,..., q. Taking q = 2 for example, we note that θ1 ≠ 0,θ2 = 0 corresponds to spatial autocorrelation along the north-south and west-east directions, while θ1 = 0,θ2 ≠ 0 corresponds to spatial autocorrelation along the northwest-southeast and northeastsouthwest directions. Furthermore, to account for anisotropy, we could further partition Nk (i) by direction as in Zhu et al. . In general, the magnitude of θkreflects not only the extent but also the direction of spatial autocorrelation.
Some special cases of the above spatial-temporal autologistic regression models (Cf. Reyes ) are as follows:
• Spatial independence:and all
• Temporal independence: and all
• Spatial-temporal separable neighborhood structure: all
• Spatial-temporal non-separable neighborhood structure: some
In what follows, for simplicity we focus on the spatial-temporal separable neighborhood structure.
Some interesting statistical problems for autologistic regression models include how to select covariates and determine an appropriate spatial and temporal neighborhood structure. For example, in studying the impact of climate change on bark beetle infestation of pine forests in North America, some of the most important scientific objectives are to identify and quantify the effects of environmental conditions (e.g. climate change) on bark beetle infestation. Also of great interest is describing the extent and direction of bark beetle dispersal . Judicious selection of covariates and spatial-temporal neighborhood structure permits fulfillment of the aforementioned scientific objectives.
For binary spatial-temporal lattice data, there is not a consensus on how to perform model selection. Particularly regarding spatialtemporal neighborhood structure, this lack of consensus has resulted in researchers employing creative but ad-hoc methods for which the statistical properties are not fully understood. For example, Zhu et al.  selected covariates using backward elimination based on t-ratios of the parameter estimates under a pre-specified spatial and temporal neighborhood structure for their analysis of the southern pine beetle outbreak in North Carolina, United States. Zhu et al.  pre-selected the spatial and temporal neighborhood structure without including covariates using the AIC and then, once the neighborhood structure was specified, chose covariates for their analysis of the mountain pine beetle outbreak in British Columbia, Canada. Using pre-selected covariates, Bandyopadhyay et al.  employed a Bayesian paradigm to compare several different spatial dependence structures for dental caries data. As these examples suggest, covariates and neighborhood structure are usually not selected simultaneously, since examining all possible combinations of covariates and neighborhood structure may be prohibitively time-consuming.
In the remainder of this editorial, we discuss some possibilities for selection of covariates and spatial-temporal neighborhood structure, based on the premise of determining which regression and autoregressive coefficients are non-zero. One idea would be to consider a penalized log-likelihood function via adaptive LASSO ,
where are regularization parameters for the regression coefficients β , k k correspond to the spatial autoregressive coefficients θ, and pertain to the temporal autoregressive coefficients Here is the likelihood function. However, for the spatial-temporal autologistic regression model, there is no explicit representation of the likelihood function. One possibility would be to replace the likelihood function by the pseudolikelihood function . Another would be to use the Monte Carlo likelihood function (see, e.g. Geyer and Thompson , Huffer and Wu ), which consistently estimates the likelihood function but is computationally intensive.
To maximize Q(η ), one possibility is to deploy a Newton-Raphson (NR) type algorithm based on a local quadratic approximation (LQA). The LQA algorithm has been used widely and shown to produce reliable results in practice, even for dependent data . However, this algorithm is slow, and a coefficient shrunk to 0 during the iteration of the algorithm remains at 0 throughout all subsequent iterations. Other methods may be considered for non-Gaussian distributions. For example, Madigan and Ridgeway  considered LARS-type algorithms for logistic regression, while Genkin et al.  proposed Bayesian logistic regression with a Laplace prior for large-scale text categorization. Park and Hastie  developed a path algorithm for variable selection in a generalized linear model based on a predictorcorrector method. We conclude this editorial by calling for further research on efficient variable and neighborhood structure selection for autologistic regression models, which will equip scientists with more advanced statistical tools for exploring and analyzing spatial-temporal lattice data.