Medical, Pharma, Engineering, Science, Technology and Business

**Peter Congdon ^{*}**

Department of Geography, Queen Mary, University of London, Mile End Rd, London E1 4NS, UK

- *Corresponding Author:
- Peter Congdon

Department of Geography

Queen Mary, University of London

Mile End Rd, London E1 4NS, UK

**E-mail:**[email protected]

**Received Date:** January 24, 2012; **Accepted Date:** February 19, 2013; **Published Date:** February 23, 2013

**Citation:** Congdon P (2013) Assessing Univariate and Bivariate Spatial Clustering in Modelled Disease Risks. J Biomet Biostat 4:161. doi:10.4172/2155-6180.1000161

**Copyright:** © 2013 Congdon P. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Visit for more related articles at** Journal of Biometrics & Biostatistics

Models for spatial variation in relative disease risk often consider posterior probabilities of elevated disease risk in each area, but for health prioritisation, the interest may also be in the broader clustering pattern across neighbouring areas. The classification of a particular area as high risk may or may not be consistent with risk levels in the surrounding areas. Local join-count statistics are used here in conjunction with Bayesian models of area disease risk to detect different forms of disease clustering over groups of neighbouring areas. A particular interest is in spatial clustering of high risk, which can be assessed by high probabilities of elevated risk across both a focus area and its surrounding locality. An application considers univariate spatial clustering in suicide deaths in 922 small areas in the North West of England, extending to an analysis of bivariate spatial clustering in suicide deaths and hospital admissions for intentional

self-harm in these areas.

Relative risk; Spatial clustering; Bayesian; Bivariate clustering; Join-count

Spatial analyses of disease incidence or mortality in small areas are often used to identify elevated risk. For example, posterior probabilities of elevated disease risk in each area may be obtained from Bayesian models of area disease counts [1-3]. However, elevated risk identified in a particular area may not extend to nearby areas, whereas spatial clustering of high risk across several adjacent or nearby areas may be of particular importance for health policy prioritisation. Identification of high risk localities may be more precise than identification of elevated risk in individual areas, especially for less frequent outcomes. Interest in spatial clustering of high risk may extend to the geographic pattern of interrelated disease outcomes (e.g., different forms of mental illness or different cardiovascular diseases).

For identifying such clustering in conjunction with a disease model, local indices of spatial association, and particular adaptations of them, become relevant. The present analysis considers use of local join-count statistics to detect high risk locality clustering for disease count data aggregated into small areas. These statistics are used in conjunction with an area disease model, and Bayesian inferences based on updating of prior assumptions using Markov chain Monte Carlo (MCMC) estimation.

The proposed join-count statistics methodology is initially applied to clustering of suicide deaths in 922 small areas in NW England. The relative mortality risks are modelled using a Bayesian spatial convolution prior [4] with alternative forms of spatial interaction (e.g., binary adjacency vs distance decay) between neighbouring areas considered. Results obtained using the join count method in conjunction with a spatial model are compared with the widely used (albeit non-Bayesian) spatial scan method, as implemented in the FlexScan and SaTScan packages. The application is then extended to analyze bivariate clustering in suicide deaths and self-harm hospitalisations.

In Bayesian small area disease applications, the classification of an area as high risk typically depends on unknown parameters. Consider disease counts (y_{i},i=1,images.n) with expected values ei obtained by multiplying area populations by the region-wide disease rate, and with .Then the y_{i} may be taken as Poisson,

where the r_{i} are relative disease risks in area i with average 1. Such risks often show spatial correlation, and ignoring such correlation can lead to biased and inefficient inference, as the observations are not independent [5]. A widely applied model (known as the convolution model) involves two sets of random effects: (a) spatially structured effects s_{i} to represent spatially correlated risks and following a conditional autoregressive (CAR) scheme [4],

where w_{ij} are symmetric spatial interactions (with w_{ii}=0 ), s_{[i]} represents the collection of S effects excluding Si, and S_{0i}= and (b) iid random effects u_{i}, typically normal with u_{i} ~ N(0,τ^{2}), to represent possible over dispersion, or excess variability in relation to the Poisson assumption. Then if there are covariates X_{i} relevant to explaining variations in area disease risk, one has:

(1)

If there are no covariates to model spatial patterning of risks, the spatial random effects represent spatial pattern in the disease outcome, whereas otherwise they capture spatial structure in the residuals. To assess actual clustering, one may obtain measures such as Moran’s I for the s_{i}; this entails deriving the index at each MCMC iteration, with posterior inferences (e.g. credible intervals) based on the values accumulated over iterations.

Consider binary measures bi of disease risk for areas i=1,images.,n, with b_{i}=1 for elevated risk, b_{i}=0 otherwise. For example, one may define b_{i}=1 for r_{i}>τ_{r}, where τ_{r} is a relative risk threshold (e.g., τ_{r}=1 or τ_{r}=1.25, and b_{i}=0) otherwise. If relative risks have average 1, and τ_{r}=1 then the region-wide proportion of areas with elevated risk, E(b_{i})=π, will be approximately 0.5.

In spatial disease applications with correlated relative risks r_{i}, binary indicators such as b_{i}=I(r_{i}>1) will also tend to be spatially correlated. Region-wide spatial clustering in the b_{i} can be measured by join-count statistics, based on concordance in risk status between area pairs. Thus, a join-count measuring clustering in high risk across a region is

with 0.5J_{11} known as the BB statistic [6]. Differing health status in neighbouring area-pairs is measured by a weighted total of joins with b_{i} and b_{j} discordant, which can be denoted

Observed join-count totals can be compared with totals expected under a null hypothesis of spatial independence [7]. The expected total of concordant joins under the hypothesis of no spatial dependence is E(J_{11})=S_{0}π^{2} where , and J_{11} will exceed E(J_{11}) when there is spatial patterning in the disease outcome [6]. Similarly the observed J_{10} can be compared with E(J_{10})=2S_{0}π(1-π), and will be less than this expected total when there is disease clustering. It may be noted that in a modelling application with r_{i} unknown, the indicators b_{i} (and related parameters such as π) are also unknown, and sampled at each MCMC iteration.

A localised set of join-count statistics (with area i as the focus) can be used to decide whether area i and nearby areas form a high risk cluster, or demonstrate an alternative risk pattern in the locality. For measuring joint high risk, with both area i and its neighbouring areas being high risk, one has

where I(A)=1 if condition A holds, and I(A)=0 otherwise. When the focus is on area i, it is relevant to distinguish discordant high-low risk pairings (b_{i}=1, b_{j}=0) from low-high risk pairings (b_{i}=0, b_{j}=1). The relevant local join-count statistics in these cases are then

and

The count J_{10i} captures situations where area i is high risk, but nearby areas are mostly low risk, so that area i may be termed a high risk local outlier. The count J_{01i} would be elevated when area i itself does not have high risk, but neighbouring areas are mostly high risk. Finally,

represents localities where both the focus and surrounding areas are low risk. The expected number of common high risk joins with area i as the focus (i.e. area i is a high risk cluster member) is E(J_{11i})=S_{0i}π^{2}, while E(J_{10i})=S_{0i}π(1-π) and E(J_{00i})=S_{0i}(1-π)^{2}.

Consider a sequence t=1,imagesimagesT of MCMC samples. From the indicators b_{i}^{(t)} of elevated risk at each MCMC iteration, one may estimate probabilities of elevated risk in area i specifically (without regard to the broader locality), namely

One may also monitor join-counts indicating locality-wide elevated risk with posterior estimates

The estimated proportion of joins in the locality centred on area i that are joint high risk, namely

provides a summary index of high risk across that locality. By contrast, the proportion of joins centred on area i that are (1,0) pairs

provides an index that area i is a high risk outlier relative to the broader locality.

The join-counts J_{11i} and J_{10i} can be written as and respectively, from which it follows that

and hence that

Hence, will be elevated when both is elevated, and risk in the surrounding locality is elevated also. By contrast, will be elevated when is elevated, but risk in the surrounding locality is relatively low. Similarly, and defining one has

Areas can be ranked in terms of to indicate which likely high risk cluster centres are. Alternative tests regarding high risk clustering in the locality around area i might be envisaged. One involves expectation weighted averages

of modelled relative risks across localities L_{i} that include both the focus area i and areas adjacent to it. These weighted averages can be monitored during the MCMC updating and the probabilities that R_{i} exceed 1 obtained. However, this test may be affected by unusually high relative risks in one or two areas within the locality, or by situations where a low risk area is surrounded by high risk areas.

Another option is to compare the sampled J_{11i} at each MCMC iteration to the expected count S_{0i}π^{2} under a no clustering hypothesis, and obtain estimates of the probabilities

If bi=1, and the comparison reduces to a condition very likely to be met in high risk localities (where risk is elevated in both the focus area and surrounding areas, so that π_{11i} is high). Hence, the comparison will tend to have a similar probability of holding as that for I (b_{i}=1) in such localities.

An extension of the proposed join-count statistics is in the detection of elevated bivariate risk across localities, based on local join-counts for the joint high risk binary event. Methods for bivariate spatial association have been proposed [8], and bivariate LISA methods indicate association between the value for one variable at a given location and the average of another variable at neighbouring locations. However, there is no widely applied cluster detection method (e.g. spatial scan technique) for bivariate outcomes. Let A and B denote two health outcomes , and consider join-counts corresponding to the joint high risk classification:

The event risks {r_{Ai} ,r_{Bi}} can be obtained via the models

(2)

with one option for priors on the random effects being

(3)

To assess high risk clustering in both events jointly, the bivariate local join-counts

can be monitored. The estimated probability of elevated bivariate risk in area i specifically is

but this elevated bivariate risk may not apply across the broader locality. However, the estimated proportion of bivariate joins in the locality centred on area i that are joint high risk, namely

provides a summary index of high bivariate risk across that locality. For detecting isolated elevated bivariate risk (high risk in the focus area but not extending to the broader locality), the relevant join

count is

Just as implications about smoothed relative risks may depend on the form of spatial interaction assumed [5], so may the inferences about clustering patterns. Implications about risk patterns for interdependent events, especially when one event is less frequent than another, may also be influenced by the form of random effects assumption (and the extent to which there is borrowing of strength). For example, clustering inferences in the less common outcome may be affected if a bivariate spatial prior (allowing correlation in spatial risks between outcomes within areas) is adopted instead of separate univariate spatial priors as in equation (3).

While relatively rare, suicide is a major reason for premature mortality. To assess risk patterns in individual areas as compared to their broader localities, we consider suicide deaths y_{i} over the period 2006 to 2010 in 922 small areas (Middle Level Super Output Areas or MSOAs) across the North West of England (**Table 1**). These areas are designed to be of similar size in population terms, with an average population of 7500. Expected deaths e_{i} are based on applying an England wide schedule of age specific suicide rates to MSOA populations, with scaling applied to ensure .

Locality | Index of focus area | π_{11i} posterior estimate |
H_{i} posterior estimate |
Relative risk r_{i} in focus area (poster-ior mean) |
h_{11i} (posterior means) |
Total areas in locality (inclu-ding focus) | Indices of areas in locality (other than focus) | Modelled relative risk R_{1i} across locality (expectation weighted average) |
Pr(R_{1i} >1) (elevated locality risk) |
SMR across locality |
---|---|---|---|---|---|---|---|---|---|---|

1 | 594 | 0.854 | 0.979 | 1.717 | 0.979 | 7 | 590, 592, 593, 595, 597, 599 | 1.547 | 0.999 | 2.151 |

2 | 16 | 0.834 | 0.989 | 1.845 | 0.989 | 9 | 5, 10, 11, 15, 17, 21, 22, 25 | 1.433 | 1.000 | 1.872 |

3 | 11 | 0.820 | 0.971 | 1.729 | 0.971 | 5 | 5, 18, 15, 16 | 1.478 | 0.997 | 2.011 |

4 | 249 | 0.761 | 0.969 | 1.692 | 0.966 | 5 | 247, 248, 251, 252 | 1.385 | 0.991 | 1.782 |

5 | 856 | 0.753 | 0.977 | 1.742 | 0.975 | 7 | 849, 851, 853, 857, 858, 859 | 1.375 | 0.995 | 1.957 |

6 | 595 | 0.743 | 0.889 | 1.460 | 0.889 | 5 | 593, 594, 596, 599 | 1.427 | 0.992 | 1.844 |

7 | 251 | 0.741 | 0.885 | 1.420 | 0.885 | 6 | 247, 248, 250, 252, 258 | 1.534 | 0.999 | 2.181 |

8 | 597 | 0.735 | 0.966 | 1.755 | 0.963 | 4 | 594, 599, 601 | 1.452 | 0.989 | 1.901 |

9 | 590 | 0.726 | 0.986 | 1.882 | 0.985 | 5 | 587, 589, 592, 594 | 1.465 | 0.993 | 1.741 |

10 | 710 | 0.723 | 0.966 | 1.768 | 0.943 | 4 | 709, 711, 712 | 1.405 | 0.968 | 1.580 |

11 | 10 | 0.714 | 0.895 | 1.452 | 0.894 | 7 | 2, 5, 6, 13, 16, 17 | 1.353 | 0.998 | 1.757 |

12 | 258 | 0.713 | 0.999 | 2.207 | 0.994 | 8 | 250, 251, 252, 255, 259, 262, 264 | 1.369 | 0.995 | 1.660 |

13 | 15 | 0.713 | 0.870 | 1.397 | 0.870 | 6 | 8, 11, 16, 18, 21 | 1.419 | 0.998 | 1.795 |

**Table 1:** Suicide mortality, areas with highest estimates (π_{11i}) for elevated locality risk.

The average mortality count is 3.5, but event totals y_{i} in individual areas vary widely, and moment estimates of relative risk y_{i} / e_{i} (sometimes called standard mortality ratios or SMRs), also vary widely. Such moment estimates are unreliable with variance instability when there are small numbers of suicide deaths, as in many MSOAs [9,10].

To provide stabilised estimates of relative risk including spatial borrowing of strength, a convolution model is applied with y_{i} ∼ Poi(e_{i}r_{i} ), where

(4)

where A flat prior on β_{0} is assumed, and a gamma prior with index 1 and shape 0.001 on the inverse spatial variance 1/σ^{2} [11,12]. Convergence is improved by linking the variance parameters; thus τ^{2} =σ^{2} / ρ where ρ is assigned an exponential prior with rate 1. Inferences in this and subsequent models are based on the second halves of two chain runs of 10,000 iterations, with convergence assessed according to BGR statistics [13].

Localities are defined as areas adjacent to area i, though the weighting attached to different areas within such localities can be varied. To assess possible sensitivity regarding inferences about locality risk, alternative assumptions about wij are investigated: equal weighting of all adjacent areas as compared to alternative forms of inverse distance decay . The binary indicators and local joincounts are monitored to provide posterior estimated probabilities of high risk common to the focus and its locality, and estimated marginal probabilities of elevated risk, namely

Consider first a binary adjacency assumption for the w_{ij} (w_{ij}=1 if areas are adjacent, w_{ij}=0, otherwise), under which the Moran spatial correlation index for the s_{i} is obtained as 0.56 with 95% interval (0.47, 0.66). **Figure 1** maps out the posterior mean relative risks r_{i} across the region, though this map tends to be dominated by low density rural areas (such as in the Lake District in the northern part of the map). Subsequently higher resolution maps are used to depict risk and clustering patterns, since in the case study, high risk clustering tends to be in densely populated urban areas. Maps of the administrative geography of the region (including maps of MSOAs) are available at the UK Map Collection page http://www.ons.gov.uk/ons/guide-method/ geography/beginner-s-guide/maps/index.html.

The estimated have an average of 0.221, with a 0.975 percentile of 0.638, and with the maximum being 0.854. The estimated marginal probabilities have an average of 0.426, with a 0.975 percentile of 0.923, and a maximum of 0.999. The , which provide indicators of isolated high risk not extending to the broader locality, have an average of 0.205, with a maximum of 0.583. The estimated are also shown; as discussed above, these are similar to in localities characterised by high risk clustering, but their ordering of potential cluster centres is similar to that of the Of the 13 areas with highest values, 10 are also among the 13 areas with highest values.

There are 5 areas with over 0.75, and 13 areas with over 0.70. **Table 1** summarises locality risk patterns for the 13 areas with over 0.70, ranked by the size of , and also including estimates of H_{i} and r_{i} (posterior means). The relatively low values for both and reflect the rarity of the suicide outcome; more frequent outcomes (such as self-harm hospitalisations considered in the bivariate analysis) are more likely to have high and (e.g. close to 1). **Table 1** also shows posterior means of expectation weighted averages of modelled relative risks across localities L_{i}, encompassing both the focus area i and areas adjacent to it. Also shown are estimated probabilities that R_{li} exceed 1, namely that the entire locality has elevated risk, and unsmoothed suicide SMRs across localities

The identify focus areas with high probabilities of elevated risk and of belonging to a high risk locality, rather than clusters per se. So some areas are present in more than one locality in **table 1**; for example, areas 594 and 599 appear twice. There are 47 distinct areas in the localities in **table 1**, and their posterior mean r_{i} range from 0.99 to 2.21 with average 1.36.

Probabilities that the average locality risk R_{1i} exceeds 1 are all over 0.968. The average locality risks R_{1i} may be used to confirm what the join-count statistics indicate, in particular the statistics, but in themselves are not conclusive about elevated risk common to both a focus area and areas around it. Weighted averages such as R_{1i} may be affected by unusually high relative risks in a subset of areas within the locality, whereas is specifically focussing on elevated risk status across all areas in a locality. An example is provided by area 28 which has 1 suicide death against 4.4 expected, with an estimated exceedance probability H_{28}=0.36. However, the areas adjacent to area 28 have 34 deaths in relation to 20 expected, with a probability of 0.98 that the locality wide R_{1,28} exceeds 1 (where the locality encompasses area 28). Note that this type of pattern would be detected by the join-counts J_{01i} and corresponding probabilities π _{01i} = J_{01i} / S_{0i}.

Delineation of high risk localities using local join-counts in conjunction with a relative risk model such as equation (1) contrasts with the spatial scan procedure which is applied to observed area disease counts without any modelling preliminaries, for example, smoothing or borrowing strength procedures to reduce unreliability in fixed effects relative risk estimates. Despite this fundamental difference, the localities of **table 1** can be compared with clusters identified by the SaTScan and FleXScan packages developed by Kulldorff [14] and Tango and Takahashi [15], respectively. It also implies to the work done by Holowaty et al. [1] and Wieckowska et al. [16]. SaTScan identifies five high rate clusters with Monte-Carlo p-values under 0.2. The average is 0.666 for the 29 areas in these five clusters, and the overlap with the local join-count method is apparent in that only 2.2% of the 922 areas have over 0.666. Similarly, the 59 areas identified by FlexScan (in 7 clusters with p-values under 0.2) have an average of 0.618.

The most likely cluster identified by SaTScan [15] contains areas {248, 249, 251, 252, 253, 258}, while FlexScan identifies the area set {249, 251, 252, 253, 258, 262, 263, 265, 271} as its leading secondary cluster (with lowest p-value after the most likely cluster). Areas {248, 249, 250, 251, 252, 255, 258, 259, 262, 264} are included in the localities identified using join-count statistics in **table 1**, and in fact consist of neighbouring areas in Tameside, a local authority district in the south east of the region, with the district of Oldham to the North and with Stockport to the South. **Figure 2** (of MSOAs in the three local government districts of Tameside, Oldham and Stockport) shows a cluster of MSOAs in the centre of the mapped sub-region, mostly in Tameside, all having posterior mean relative risks above 1.10.

The most likely cluster identified by FlexScan consists of the areas {587, 588, 590, 591, 593, 594, 597, 599}, and the similar area set {590, 592, 593, 594, 595, 597, 599} is also the leading secondary cluster identified by SatScan. Areas {587, 589, 590, 592, 593, 594, 595, 596, 597, 599 and 601} are included in the areas in **table 1**, and consist of a set of areas in the coastal town of Blackpool. **Figure 3** (of MSOAs in the three local government districts of Blackpool, Wyre and Fylde) shows this cluster of adjacent MSOAs at the westernmost centre of the plot, all having posterior mean relative risks above 1.15 except for area 589 with modelled relative risk of 0.994, but encompassed within surrounding higher risk areas.

The local join-count procedure also provides estimates of which will be elevated when is elevated, but risk in the surrounding locality is relatively low. These may be considered as local high risk outliers, discordant in terms of health status from their neighbours. To demonstrate the contrasting risk patterns between the focus area and surrounding areas, we define A_{i}, encompassing areas adjacent to the focus area i but not including that area.

Thus, **table 2** shows the 12 MSOAs with over 0.5, the modelled relative risk r_{i} (posterior mean) in the focus area, and posterior mean relative risk in the surrounding area, namely

Index of focus area | H_{i} posterior estimate |
π posterior estimate_{10i} |
Modelled relative risk, r_{i}, in focus area (posterior mean) |
Number of areas in surrounding locality (excluding focus) | Modelled relative risk R_{2i} across rest of locality (excl focus) |
Pr(R_{2i}>1) (elevated risk in adjacent areas) |
SMR across rest of locality |
---|---|---|---|---|---|---|---|

532 | 0.853 | 0.583 | 1.379 | 5 | 0.920 | 0.248 | 0.768 |

620 | 0.800 | 0.582 | 1.358 | 2 | 0.890 | 0.239 | 0.490 |

439 | 0.787 | 0.580 | 1.288 | 6 | 0.868 | 0.113 | 0.702 |

772 | 0.974 | 0.550 | 1.805 | 4 | 1.009 | 0.492 | 0.739 |

554 | 0.739 | 0.545 | 1.233 | 6 | 0.847 | 0.091 | 0.693 |

406 | 0.804 | 0.521 | 1.299 | 6 | 0.917 | 0.234 | 0.856 |

862 | 0.896 | 0.514 | 1.469 | 4 | 0.977 | 0.411 | 0.695 |

763 | 0.743 | 0.509 | 1.252 | 3 | 0.887 | 0.216 | 0.869 |

158 | 0.855 | 0.506 | 1.329 | 7 | 0.956 | 0.329 | 0.877 |

740 | 0.960 | 0.503 | 1.145 | 7 | 1.022 | 0.551 | 0.754 |

744 | 0.765 | 0.502 | 1.244 | 4 | 0.924 | 0.271 | 0.782 |

198 | 0.749 | 0.502 | 1.251 | 5 | 0.900 | 0.209 | 0.532 |

**Table 2:** Suicide mortality, areas with highest estimates (*π _{10i}*) for outlier high risk.

Also shown are unsmoothed suicide SMRs across adjacent areas For all but one area, the probabilities that R_{2i} (average risk in the locality excluding the focus area) exceed 1 are under 0.5, whereas the probabilities H_{i} of elevated risk in the focus area itself all exceed 0.7.

Inferences from convolution or other area disease count models may be affected by the form of spatial interaction assumed. Two alternatives to binary adjacency are considered, which involve down weighting areas at greater distance from the focus area (with inter-area distances based on population centroids). These assume distance decay according (γ>0) with values of γ=0.5 and γ=1 considered. These values are based on a preliminary analysis using model (4) to find an optimal value for γ using a discrete prior over values {0, 0.1, 0.2,images., 1.5}, which produced a posterior mean for γ of 0.69.

We focus on elevated locality risk in particular, and **table 3** summarises locality risk patterns under the two distance decay options. The table considers only areas with over 0.70, ranked by The weighted averages of modelled relative risks across localities L_{i} (centred on and including area i) now adjust also for distance decay as well as expected deaths, namely

Locality | Index of focus area | π posterior estimate_{11i} |
H_{i} posterior estimate |
Relative risk r_{i} in focus area (poster-ior mean) |
Indices of areas in locality (other than focus) | Modelled relative risk R_{3i} across locality |
Pr(R_{3i} >1) (elevated locality risk) |
SMR across locality | |||
---|---|---|---|---|---|---|---|---|---|---|---|

Distance Decay Coefficient (γ) equals 1 |
|||||||||||

1 | 710 | 0.833 | 0.969 | 1.77 | 709, 711, 712 | 1.52 | 0.978 | 1.58 | |||

2 | 16 | 0.831 | 0.985 | 1.76 | 5, 10, 11, 15, 17, 21, 22, 25 | 1.40 | 1.000 | 1.87 | |||

3 | 594 | 0.825 | 0.967 | 1.59 | 590, 592, 593, 595, 597, 599 | 1.43 | 0.997 | 2.15 | |||

4 | 11 | 0.815 | 0.964 | 1.61 | 5, 8, 15, 16 | 1.41 | 0.997 | 2.01 | |||

5 | 711 | 0.810 | 0.888 | 1.49 | 709, 710, 712 | 1.55 | 0.984 | 1.58 | |||

6 | 595 | 0.754 | 0.889 | 1.39 | 593, 594, 596, , 599 | 1.38 | 0.988 | 1.84 | |||

7 | 249 | 0.748 | 0.964 | 1.58 | 247, 248, 251, 252 | 1.34 | 0.986 | 1.78 | |||

8 | 597 | 0.730 | 0.957 | 1.60 | 594, 599, 601 | 1.36 | 0.975 | 1.90 | |||

9 | 856 | 0.728 | 0.975 | 1.66 | 849, 851, 853, 857, 858, 859 | 1.33 | 0.988 | 1.96 | |||

10 | 337 | 0.723 | 0.979 | 1.70 | 332, 334, 340, 542 | 1.37 | 0.983 | 1.65 | |||

11 | 712 | 0.715 | 0.853 | 1.43 | 707, 709, 710, 711 | 1.43 | 0.971 | 1.30 | |||

12 | 333 | 0.712 | 0.912 | 1.45 | 330, 334, 336, 340 | 1.29 | 0.967 | 1.64 | |||

13 | 10 | 0.711 | 0.881 | 1.41 | 2, 5, 6, 13, 16, 17 | 1.32 | 0.992 | 1.76 | |||

14 | 732 | 0.709 | 0.933 | 1.52 | 27, 728, 729, 731, 736 | 1.35 | 0.963 | 1.64 | |||

15 | 15 | 0.709 | 0.860 | 1.34 | 8, 11, 16, 18, 21 | 1.36 | 0.996 | 1.79 | |||

Distance Decay Coefficient (γ) equals 0.5 |
|||||||||||

1 | 594 | 0.839 | 0.972 | 1.64 | 590, 592, 593, 595, 597, 599 | 1.47 | 0.998 | 2.15 | |||

2 | 16 | 0.831 | 0.988 | 1.80 | 5, 10, 11, 15, 17, 21, 22, 25 | 1.41 | 0.999 | 1.87 | |||

3 | 11 | 0.818 | 0.968 | 1.67 | 5, 8, 15, 16 | 1.44 | 0.995 | 2.01 | |||

4 | 710 | 0.789 | 0.973 | 1.78 | 709, 711, 712 | 1.47 | 0.977 | 1.58 | |||

5 | 249 | 0.763 | 0.967 | 1.64 | 247, 248, 251, 252 | 1.37 | 0.984 | 1.78 | |||

6 | 711 | 0.763 | 0.895 | 1.48 | 709, 710, 712 | 1.49 | 0.980 | 1.58 | |||

7 | 595 | 0.754 | 0.894 | 1.42 | 593, 594, 596, 599 | 1.39 | 0.988 | 1.84 | |||

8 | 856 | 0.749 | 0.978 | 1.71 | 849, 851, 853, 857, 858, 859 | 1.36 | 0.993 | 1.96 | |||

9 | 597 | 0.726 | 0.959 | 1.66 | 594, 599, 601 | 1.38 | 0.978 | 1.90 | |||

10 | 251 | 0.722 | 0.871 | 1.38 | 247, 249, 251, 252, 258 | 1.46 | 1.000 | 2.18 | |||

11 | 15 | 0.715 | 0.868 | 1.37 | 8, 11, 16, 18, 21 | 1.39 | 0.994 | 1.79 | |||

12 | 10 | 0.713 | 0.895 | 1.42 | 2, 5, 6, 13, 16, 17 | 1.33 | 0.993 | 1.76 | |||

13 | 333 | 0.713 | 0.915 | 1.48 | 330, 334, 336, 340 | 1.30 | 0.965 | 1.64 | |||

14 | 258 | 0.705 | 0.999 | 2.16 | 250, 251, 252, 255, 259, 262, 264 | 1.38 | 0.998 | 1.66 |

**Table 3:** Elevated locality risks, local join-count statistics and distance decay options.

**Table 3** shows posterior mean R_{3i} and probabilities that R_{3i} exceed 1. Unsmoothed suicide SMRs across the locality are defined as before.

There is considerable overlap between **table 3** and **table 1** in those focus areas identified as having both elevated “own area” risk (high H_{i}) and elevated risk across the locality also. Thus of the 13 areas with high identified in **table 1**, 11 also appear as focus areas in the top panel (high distance decay w_{ij}) of **table 3**, and the other two (areas 251, 590) are included in the broader localities listed there. All 13 cluster-centre areas identified in **table 1** appear as such areas in the lower panel of **table 3** (less marked distance decay).

We now consider local join-counts for detecting bivariate risks that are both significantly elevated and also spatially clustered. Consider suicide deaths y_{Ai} for 2006-10 as discussed above, and self-harm hospitalisations y_{Bi} for 2006-7 to 2010-11 (five financial years, with ICD10 X60--X84 codes) across the 922 MSOAs in NW England (the data can be obtained at http://www.apho.org.uk/resource). Expected hospitalisations e_{Bi} are based on England wide age specific rates, with scaling applied to ensure . Self-harm is often a precursor to later actual suicide, but considerably more frequent with average event count .

A convolution model is applied with , but comparing two alternative procedures to provide stabilised estimates of relative risk. The first includes spatial borrowing of strength within outcomes, but without such borrowing between outcomes, and unrelated CAR and iid priors for each event

with , and Spatial interactions wij are binary based on adjacency. The second procedure assumes follow a bivariate CAR prior [17] with unknown within area covariance matrix

with taken to be Wishart with 2 degrees of freedom and identity scale matrix.

The bivariate indicators

are monitored in each case to provide bivariate join counts . From these one obtains indicators of elevated bivariate risk encompassing both the focus area and its surrounding locality

One may also estimate the weighted locality relative risks for each event, namely

and the probabilities that they exceed 1.

For the model without pooling between outcomes, **table 4** (top panel) shows there are 14 areas with exceeding 0.5. It can be seen that stronger locality inferences hold for the more frequent second outcome (self-harm), with all the probabilities Pr(R_{Bi} >1) being 1. However, the identified localities also have Pr(R_{Ai} >1) exceeding 0.9 for all 14 cluster centres, and exceeding 0.95 for 12 cluster centres.

(a) Without pooling between outcomes | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

Index of focus area | π posterior estimate_{11ABi} |
H_{ABi} |
Relative risk r_{Ai} in (focus Area) |
Relative risk r_{Bi} in (focus Area) |
Indices of areas in locality (other than focus) | Modelled relative risk R_{Ai} across locality |
Modelled relative risk R_{Bi} across locality |
Pr(R_{Ai} >1) (elevated locality risk) |
Pr(R_{Bi} >1) (elevated locality risk) |
SMR across locali-ty | Self‐harm SHR across locality |

594 | 0.77 | 0.99 | 1.74 | 2.12 | 590, 592, 593, 595, 597, 599 | 1.56 | 1.71 | 1.00 | 1.00 | 2.15 | 1.73 |

333 | 0.75 | 0.94 | 1.52 | 1.76 | 330, 334, 336, 340 | 1.33 | 1.38 | 0.97 | 1.00 | 1.64 | 1.39 |

597 | 0.64 | 0.98 | 1.80 | 2.12 | 594, 599, 601 | 1.47 | 1.57 | 0.99 | 1.00 | 1.90 | 1.58 |

16 | 0.58 | 0.99 | 1.76 | 2.60 | 5, 10, 11, 15, 17, 21, 22, 25 | 1.47 | 1.28 | 1.00 | 1.00 | 1.87 | 1.29 |

29 | 0.58 | 0.84 | 1.31 | 1.22 | 22, 25, 26, 27, 32, 33 | 1.30 | 1.24 | 0.98 | 1.00 | 1.53 | 1.25 |

312 | 0.57 | 0.94 | 1.52 | 2.56 | 309, 310, 311, 315 | 1.21 | 1.60 | 0.91 | 1.00 | 1.30 | 1.60 |

315 | 0.54 | 0.77 | 1.19 | 1.95 | 310, 311, 312, 316, 318, 323, 327 | 1.22 | 1.63 | 0.95 | 1.00 | 1.48 | 1.64 |

251 | 0.54 | 0.95 | 1.51 | 1.90 | 247, 249, 250, 252, 258 | 1.52 | 1.34 | 1.00 | 1.00 | 2.18 | 1.34 |

573 | 0.54 | 0.80 | 1.27 | 1.40 | 569, 572, 574, 577 | 1.30 | 1.73 | 0.97 | 1.00 | 1.70 | 1.75 |

731 | 0.53 | 0.79 | 1.26 | 1.29 | 729, 732, 733, 736 | 1.33 | 1.26 | 0.97 | 1.00 | 1.79 | 1.26 |

33 | 0.52 | 0.82 | 1.28 | 1.53 | 26, 29, 32, 174, 176 | 1.19 | 1.35 | 0.91 | 1.00 | 1.23 | 1.36 |

76 | 0.52 | 0.78 | 1.21 | 1.24 | 72, 73, 74, 78, 79, 81 | 1.22 | 1.16 | 0.96 | 1.00 | 1.33 | 1.17 |

856 | 0.50 | 0.99 | 1.68 | 2.22 | 849, 851, 853, 857, 858, 859 | 1.37 | 1.33 | 0.99 | 1.00 | 1.96 | 1.34 |

729 | 0.50 | 0.95 | 1.50 | 1.29 | 726, 727, 731, 732, 733, 734 | 1.24 | 1.31 | 0.93 | 1.00 | 1.34 | 1.32 |

(b) With pooling between outcomes | |||||||||||

Index of focus area | π posterior estimate_{11ABi} |
H_{ABi} |
Relative risk rAi in (focus Area) | Relative risk r_{Bi} in (focus Area) |
Indices of areas in locality (other than focus) | Modelled relative risk R_{Ai} across locality |
Modelled relative risk R_{Bi} across locality |
Pr(R_{Ai} >1) (elevated locality risk) |
Pr(R_{Bi} >1) (elevated locality risk) |
SMR across locality | Self‐harm SHR across locality |

333 | 0.87 | 0.99 | 1.69 | 1.77 | 330, 334, 336, 340 | 1.40 | 1.38 | 1.00 | 1.00 | 1.64 | 1.39 |

594 | 0.82 | 1.00 | 1.98 | 2.13 | 590, 592, 593, 595, 597, 599 | 1.71 | 1.72 | 1.00 | 1.00 | 2.15 | 1.73 |

197 | 0.79 | 1.00 | 1.87 | 4.21 | 190, 195, 196, 201 | 1.31 | 1.81 | 0.98 | 1.00 | 0.98 | 1.82 |

315 | 0.76 | 0.96 | 1.42 | 1.95 | 310, 311, 312, 316, 318, 323, 327 | 1.34 | 1.63 | 1.00 | 1.00 | 1.48 | 1.64 |

33 | 0.74 | 0.95 | 1.43 | 1.54 | 26, 29, 32, 174, 176 | 1.30 | 1.35 | 0.99 | 1.00 | 1.23 | 1.36 |

313 | 0.73 | 0.97 | 1.48 | 2.70 | 306, 308, 311, 314, 317, 318 | 1.25 | 1.73 | 0.97 | 1.00 | 1.14 | 1.74 |

29 | 0.72 | 0.93 | 1.37 | 1.22 | 22, 25, 26, 27, 32, 33 | 1.36 | 1.24 | 1.00 | 1.00 | 1.53 | 1.25 |

772 | 0.71 | 0.99 | 1.86 | 2.24 | 767, 768, 773, 777 | 1.29 | 1.60 | 0.95 | 1.00 | 1.42 | 1.61 |

312 | 0.69 | 1.00 | 1.83 | 2.58 | 309, 310, 311, 315 | 1.33 | 1.60 | 0.99 | 1.00 | 1.30 | 1.60 |

76 | 0.69 | 0.91 | 1.32 | 1.24 | 72, 73, 74, 78, 79, 81 | 1.30 | 1.16 | 0.99 | 1.00 | 1.33 | 1.17 |

733 | 0.68 | 0.92 | 1.36 | 1.37 | 729, 731, 734, 735, 736 | 1.37 | 1.33 | 0.99 | 1.00 | 1.43 | 1.33 |

573 | 0.68 | 0.90 | 1.37 | 1.40 | 569, 572, 574, 577 | 1.48 | 1.74 | 1.00 | 1.00 | 1.70 | 1.75 |

731 | 0.67 | 0.92 | 1.41` | 1.29 | 729, 732, 733, 736 | 1.43 | 1.27 | 0.99 | 1.00 | 1.79 | 1.26 |

492 | 0.67 | 0.97 | 1.56 | 3.20 | 489, 494, 495, 501 | 1.26 | 2.14 | 0.95 | 1.00 | 1.34 | 2.15 |

**Table 4:** Bivariate risk, areas with highest probabilities for cluster centres.

Inferences regarding the rarer outcome, both for the focus area and the locality, become stronger when there is pooling between the two outcomes (**Table 4**, lower panel). The pooling model is in fact supported by the data, since the Deviance Information Criterion [18] is reduced from 11311 to 11221, and the posterior estimate (with 95% CrI) for ρ_{AB} is 0.75 (0.64, 0.84). Moran spatial correlation indices for s_{Ai} and s_{Bi} are obtained as 0.45 (0.39, 0.52) and 0.42 (0.41, 0.45) respectively.

There are in fact now 28 areas with exceeding 0.60, but **table 4** contains the same number of cluster centres under the two options in order to facilitate comparison. The locality with the highest under the pooling model consists of five MSOAs in Wigan (areas 330, 333, 334, 336, and 340), and has 28 suicide deaths (against 17 expected), and 627 self-harm hospitalisations against 450.6 expected. Other MSOAs in Wigan with elevated and clustered bivariate risk are apparent in **table 4** (the 4^{th}, 6^{th} and 9^{th} focus areas in the lower panel). **Figures 4** and **5** show modelled relative risks for the two outcomes in MSOAs in Wigan (MSOAs in centre), and in the adjacent St Helens and Bolton districts. It can be seen from both figures that high suicide and self-harm rates occur widely through these three districts, but that elevated levels of both self-harm and suicide together are apparent in areas coded 330, 333, 334, 336, and 340 (in the centre of the southern boundary), and also in a north-west aligned band of Wigan MSOAs in the central part of the map.

The lower panel of **table 4** shows two new cluster centres (197, 492), as compared to the upper panel, these being areas where selfharm risk (both observed and modelled) is high, and estimated suicide risk is pulled towards the risk for the more common outcome under the bivariate spatial prior. Using extra information about risk patterns provided by a more frequent outcome (or by intercorrelation between outcomes in general) is generally regarded as beneficial. This is a form of borrowing strength [19] enabling stronger inferences for an infrequent outcome. However, analysis such as that here, of potential impacts on inferences about clustering, may provide an additional facet for assessing sensitivity to alternative spatial priors.

Considering the results of both the univariate and bivariate clustering analyses together, one may set out some criteria for choosing focus areas for high risk localities. The choice of cut-off for (or for bivariate outcomes) should be based on the profile of their ranked values, in conjunction with information about risk variation (e.g. the profile of H_{i} and R_{i}). The necessary interconnection with H_{i} follows from the relation .

Health outcome data for small areas vary considerably in the extent to which significant variations in area relative risk (and hence locality clustering) can be detected and this affects cut-off choice. For example, for the relatively rare suicide outcome, there are only 18 areas with Pr(b_{i} =1| y) = H_{i} exceeding 0.95, and a cut-off > 0.7 was used, with a minimum H_{i} of 0.87 among the 13 areas above this cut-off. A slightly lower cut-off could be entertained, though the 17^{th} ranked area in terms of (with = 0.69 ) has a relatively low H_{i} of 0.79, below the threshold of H_{i}=0.8 for elevated risk suggested by Richardson et al. [3]. The probabilities Pr(R_{i} >1| y) that the locality wide modelled SMRs exceed 1 are also relevant, provided the R_{i} are obtained for localities where both the focus and surrounding areas have elevated risk. All 13 areas with > 0.7 have Pr(R_{i} >1| y) exceeding 0.95.

Whereas suicide is a rare outcome, self-harm is around 25 times more frequent. When a univariate clustering analysis (comparable to that carried out for completed suicide and reported on above) is carried out for self-harm, there are 275 MSOAs with H_{i} exceeding 0.95, and 16 MSOAs with > 0.9, so a higher cut-off point could be used to detect high risk clusters for this outcome.

For the bivariate outcome analysis (suicide and self-harm) without borrowing of strength between outcomes (e.g., **Table 4** upper panel), there are 14 areas with exceeding 0.95, and a relatively low cut-off of was used. The implications of using a slightly lower cut-off point could be considered, since even in this analysis, the locality relative risks R_{Ai} and R_{Bi} are significantly elevated (above 1) at lower values of than the illustrative cutoff taken.

It follows from the above discussion that there are no simple rules for a low threshold or below which clustering is implausible. It depends on the profile of H_{i} and R_{i} as well as on the profile of . Also relevant is the relative size of and , the latter being the probability of a high risk area surrounded by low risk areas. Where an area has below 0.9, or Hi below 0.75, or clearly exceeding then high risk clustering becomes considerably less likely.

Small area disease models often use exceedance probabilities for each individual area to make inferences about risk patterns. However, elevated risk in an area may not necessarily extend to the surrounding locality. This paper has sought to identify areas where elevated risk extends to the broader locality using local join-count statistics. These statistics can identify local outliers as well as high risk cluster centres, and can be applied to assess high risk clustering in more than one health outcome.

The procedure here can be used in conjunction with a disease model where risk status is unknown, so enabling the clustering implications of contrasting likelihood and prior assumptions (e.g. regarding pooling between areas, and outcomes) to be assessed. In particular, inferences about clustering patterns in two outcomes considered jointly may well be influenced by alternative assumptions, particularly when a spatial prior borrows strength over outcomes as well as areas. Sensitivity of clustering inferences to alternative priors for spatial effects, such as the approach of Leroux et al. [20] in contrast to the convolution prior, also provides an additional area of research.

- Holowaty EJ, Norwood TA, Wanigaratne S, Abellan JJ, Beale L (2010) Feasibility and utility of mapping disease risk at the neighbourhood level within a Canadian public health unit: an ecological study. Int J Health Geogr 10: 21.
- Knorr-Held L, Becker N (2000) Bayesian modelling of spatial heterogeneity in disease maps with application to German cancer mortality data. Allgemeines Statistisches Archiv 84: 121-140
- Richardson S, Thomson A, Best N, Elliott P (2004) Interpreting posterior relative risk estimates in disease-mapping studies. Environ Health Perspect 112: 1016-1025.
- Besag J, York J, Mollie A (1991) Bayesian image restoration, with two applications in spatial statistics. Ann Inst Stat Math 43: 1-20.
- Earnest A, Morgan G, Mengersen K, Ryan L, Summerhayes R, et al. (2007) Evaluating the effect of neighbourhood weight matrices on smoothing properties of Conditional Autoregressive (CAR) models. Int J Health Geogr 6: 54.
- Bell N, Schuurman N, Hameed SM (2008) Are injuries spatially related? Join-count spatial autocorrelation for small-area injury analysis. Inj Prev 14: 346-353.
- Schabenberger O, Gotway CA (2009) Statistical Methods for Spatial Data Analysis. Chapman & Hall/CRC.
- Lee S (2001) Developing a bivariate spatial association measure: an integration of Pearson's r and Moran's I. J Geogr Syst 3: 369-385.
- Anselin L, Lozano N, Koschinsky J (2006) Rate Transformations and Smoothing, GeoDa Center Research Report.
- Riggan WB, Manton KG, Creason JP, Woodbury MA, Stallard E (1991) Assessment of spatial variation of risks in small populations. Environ Health Perspect 96: 223-238.
- Besag J, Green P, Higdon D, Mengersen K (1995) Bayesian computation and stochastic systems. Statistical Science 10: 3-66.
- Higdon D (2007) A Primer on Space-Time Modeling From a Bayesian Perspective. In: Statistical Methods for Spatio-Temporal Systems. Finkenstädt B, Held L, Isham V, (ed.). Chapman & Hall, Boca Raton, FL 217-279.
- Brooks SP, Gelman A (1998) General methods for monitoring convergence of iterative simulations. J Comput Graph Stat 7: 434-455.
- Kulldorff M (1997) A spatial scan statistic. Communications in Statistics 26: 1481-1496.
- Tango T, Takahashi K (2005) A flexibly shaped spatial scan statistic for detecting clusters. Int J Health Geogr 4: 11.
- Wieckowska B, Materna-Kiryluk A, Kossowski T, Moczko J, Wisniewska K, et al. (2012) Location of cleft lip with or without cleft palate prevalence clusters using Kulldorff scan statistics. Computational Methods in Science and Technology 18.
- Mardia KV (1988) Multi-dimensional multivariate Gaussian Markov random fields with application to image processing. J Multivar Anal 24: 265-284.
- Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A (2002) Bayesian measures of model complexity and fit. J R Stat SocSeriesBStat Methodol 64: 583-639.
- Kirkham JJ, Riley RD, Williamson PR (2012) A multivariate meta-analysis approach for reducing the impact of outcome reporting bias in systematic reviews. Stat Med 31: 2179-2195.
- Leroux B, Lei X, Breslow N (1999) Estimation of disease rates in small areas: a new mixed model for spatial dependence. In: Statistical Models in Epidemiology, the Environment and Clinical Trials, Halloran M, Berry D, (eds.). Springer, New York.

Select your language of interest to view the total content in your interested language

- Adomian Decomposition Method
- Algebra
- Algebraic Geometry
- Algorithm
- Analytical Geometry
- Applied Mathematics
- Artificial Intelligence Studies
- Axioms
- Balance Law
- Behaviometrics
- Big Data Analytics
- Big data
- Binary and Non-normal Continuous Data
- Binomial Regression
- Bioinformatics Modeling
- Biometrics
- Biostatistics methods
- Biostatistics: Current Trends
- Clinical Trail
- Cloud Computation
- Combinatorics
- Complex Analysis
- Computational Model
- Computational Sciences
- Computer Science
- Computer-aided design (CAD)
- Convection Diffusion Equations
- Cross-Covariance and Cross-Correlation
- Data Mining Current Research
- Deformations Theory
- Differential Equations
- Differential Transform Method
- Findings on Machine Learning
- Fourier Analysis
- Fuzzy Boundary Value
- Fuzzy Environments
- Fuzzy Quasi-Metric Space
- Genetic Linkage
- Geometry
- Hamilton Mechanics
- Harmonic Analysis
- Homological Algebra
- Homotopical Algebra
- Hypothesis Testing
- Integrated Analysis
- Integration
- Large-scale Survey Data
- Latin Squares
- Lie Algebra
- Lie Superalgebra
- Lie Theory
- Lie Triple Systems
- Loop Algebra
- Mathematical Modeling
- Matrix
- Microarray Studies
- Mixed Initial-boundary Value
- Molecular Modelling
- Multivariate-Normal Model
- Neural Network
- Noether's theorem
- Non rigid Image Registration
- Nonlinear Differential Equations
- Number Theory
- Numerical Solutions
- Operad Theory
- Physical Mathematics
- Quantum Group
- Quantum Mechanics
- Quantum electrodynamics
- Quasi-Group
- Quasilinear Hyperbolic Systems
- Regressions
- Relativity
- Representation theory
- Riemannian Geometry
- Robotics Research
- Robust Method
- Semi Analytical-Solution
- Sensitivity Analysis
- Smooth Complexities
- Soft Computing
- Soft biometrics
- Spatial Gaussian Markov Random Fields
- Statistical Methods
- Studies on Computational Biology
- Super Algebras
- Symmetric Spaces
- Systems Biology
- Theoretical Physics
- Theory of Mathematical Modeling
- Three Dimensional Steady State
- Topologies
- Topology
- mirror symmetry
- vector bundle

- 6th International Conference on
**Biostatistics**and**Bioinformatics**

November 13-14, 2017, Atlanta, USA

- Total views:
**11607** - [From(publication date):

March-2013 - Oct 19, 2017] - Breakdown by view type
- HTML page views :
**7824** - PDF downloads :
**3783**

Peer Reviewed Journals

International Conferences 2017-18