On Selecting Spatial-Temporal Autologistic Regression Models for Binary Lattice Data

In many biological and physical sciences, rapid advances in technical capabilities have dramatically increased the amount of data that are collected across space and over time. Spatial-temporal models are important tools for the analysis of spatial data collected repeatedly over time and have been applied to a wide range of problems, including modeling patterns in lung cancer [1], breast cancer [2], birth defects [3], and West Nile virus [4]; see also Cressie [5], Rue and Held [6], and Schabenberger and Gotway [7]. In particular, for binary data that are observed on a spatial lattice over time, spatial-temporal autologistic regression models relate binary responses to covariates while accounting for spatial and temporal dependence simultaneously [8,9].


Introduction
In many biological and physical sciences, rapid advances in technical capabilities have dramatically increased the amount of data that are collected across space and over time. Spatial-temporal models are important tools for the analysis of spatial data collected repeatedly over time and have been applied to a wide range of problems, including modeling patterns in lung cancer [1], breast cancer [2], birth defects [3], and West Nile virus [4]; see also Cressie [5], Rue and Held [6], and Schabenberger and Gotway [7]. In particular, for binary data that are observed on a spatial lattice over time, spatial-temporal autologistic regression models relate binary responses to covariates while accounting for spatial and temporal dependence simultaneously [8,9].

Spatial-Temporal Autologistic Regression Model
Let it y denote the response variable such that  Here jit x denotes the th j covariate at site i and time t, corresponds to spatial autocorrelation along the north-south and west-east directions, while 1 2 0, 0 θ θ = ≠ corresponds to spatial autocorrelation along the northwest-southeast and northeastsouthwest directions. Furthermore, to account for anisotropy, we could further partition ( ) k N i by direction as in Zhu et al. [10]. In general, the magnitude of θ k reflects not only the extent but also the direction of spatial autocorrelation. Some special cases of the above spatial-temporal autologistic regression models (Cf. Reyes [11]) are as follows: In what follows, for simplicity we focus on the spatial-temporal separable neighborhood structure.

Model Selection
Some interesting statistical problems for autologistic regression models include how to select covariates and determine an appropriate spatial and temporal neighborhood structure. For example, in studying the impact of climate change on bark beetle infestation of pine forests in North America, some of the most important scientific objectives are to identify and quantify the effects of environmental conditions (e.g. climate change) on bark beetle infestation. Also of great interest is describing the extent and direction of bark beetle dispersal [12]. Judicious selection of covariates and spatial-temporal neighborhood structure permits fulfillment of the aforementioned scientific objectives.
For binary spatial-temporal lattice data, there is not a consensus on how to perform model selection. Particularly regarding spatialtemporal neighborhood structure, this lack of consensus has resulted in researchers employing creative but ad-hoc methods for which the statistical properties are not fully understood. For example, Zhu et al. [13] selected covariates using backward elimination based on t-ratios of the parameter estimates under a pre-specified spatial and temporal neighborhood structure for their analysis of the southern pine beetle outbreak in North Carolina, United States. Zhu et al. [8] pre-selected the spatial and temporal neighborhood structure without including covariates using the AIC and then, once the neighborhood structure was specified, chose covariates for their analysis of the mountain pine beetle outbreak in British Columbia, Canada. Using pre-selected covariates, Bandyopadhyay et al. [9] employed a Bayesian paradigm to compare several different spatial dependence structures for dental caries data. As these examples suggest, covariates and neighborhood structure are usually not selected simultaneously, since examining all possible combinations of covariates and neighborhood structure may be prohibitively time-consuming.
In the remainder of this editorial, we discuss some possibilities for selection of covariates and spatial-temporal neighborhood structure, based on the premise of determining which regression and autoregressive coefficients are non-zero. One idea would be to consider a penalized log-likelihood function via adaptive LASSO [14], pertain to the temporal autoregressive coefficients Here is the likelihood function. However, for the spatial-temporal autologistic regression model, there is no explicit representation of the likelihood function. One possibility would be to replace the likelihood function by the pseudolikelihood function [15]. Another would be to use the Monte Carlo likelihood function (see, e.g. Geyer and Thompson [16], Huffer and Wu [17]), which consistently estimates the likelihood function but is computationally intensive.
To maximize ( ), η Q one possibility is to deploy a Newton-Raphson (NR) type algorithm based on a local quadratic approximation (LQA). The LQA algorithm has been used widely and shown to produce reliable results in practice, even for dependent data [18]. However, this algorithm is slow, and a coefficient shrunk to 0 during the iteration of the algorithm remains at 0 throughout all subsequent iterations. Other methods may be considered for non-Gaussian distributions. For example, Madigan and Ridgeway [19] considered LARS-type algorithms for logistic regression, while Genkin et al. [20] proposed Bayesian logistic regression with a Laplace prior for large-scale text categorization. Park and Hastie [21] developed a path algorithm for variable selection in a generalized linear model based on a predictorcorrector method. We conclude this editorial by calling for further research on efficient variable and neighborhood structure selection for autologistic regression models, which will equip scientists with more advanced statistical tools for exploring and analyzing spatial-temporal lattice data.