Models and Methods for Spatial Data: Detecting Outliers and Handling Zero-inflated counts
Spatial modelling is useful in a variety of settings including disease mapping and species abundance studies. Hierarchical spatial models provide a flexible framework for modelling complex spatial correlated data. In the context of disease mapping, contributions have helped to pinpoint potential causes of mortality and to provide a strategy for effective allocation of health funding. Often the identification of extreme risk areas are of particular interest. These areas may arise in proximity to one another in a smooth spatial surface, or they may arise as isolated `hot spots' or `low spots' which are quite distinct from those of neighbouring sites. Such `spatial outliers' are not accommodated by standard hierarchical spatial models. Similarly, zero-inflated data are not uncommon and are not handled well by standard models. These values may also be of particular interest, as for instance, with species abundance studies where such zeros may provide important clues to physical characteristics associated with habitat suitability or individual immunity.
Hierarchical spatial models used for disease mapping studies have not generally found application at Vital Statistics agencies because of the complexity of spatial analyses. In chapter two we consider whether approximate methods of inference are reliable for mapping studies, especially in terms of providing accurate estimates of relative risks, ranks of regions, and standard errors of risks.
The main focus is assessing how close penalized quasi-likelihood estimates are to target values. This is done by comparisons with the Bayesian Markov Chain Monte Carlo methods. The quantities of prime interest are small-area relative risks and the estimated ranks of the risks which are often used for ordering the regions. It is shown that penalized quasi-likelihood is a reasonably accurate method of inference and can be recommended as a simple, yet quite precise method for initial exploratory studies.
In chapter 3, spatial methods are developed which allow extreme risk areas to arise in proximity to one another in a smooth spatial surface, or to arise in isolated `hot spots' or `low spots'. The former is modelled by a spatially smooth surface using a conditional autoregressive model; the latter is addressed with the addition of a discrete clustering component, which offers the flexibility of accommodating extreme isolated risks and is not limited by spatial smoothness. Both types of extreme risk are important, however isolated extremes may provide insight into areas with the potential of being a center for future spatially correlated extreme risks. Thus, they are important in terms of surveillance. A Bayesian approach to inference is employed, graphical techniques for isolating extremes are illustrated, and model assessment is conducted via cross-validation posterior predictive checks.
In chapter 4 we review the overdispersion and zero-inflation literature, particularly with regard to models which accommodate correlated data. The focus of this chapter is the development of a series of zero-inflated spatial models. These models are compared using data on white pine weevil infestations in spruce trees. All of the zero-inflated spatial models fit the data well, however they each highlight different features of the data. The spatial models use a variety of structures for the probability of belonging to the zero component, thus allowing the probability of `resistance' to differ across models. In particular, one model focuses on individually resistant trees which are located among infested trees while another focuses on clusters of resistant trees which are likely located within protective habitats. We discuss the unique features highlighted by each of the zero-inflated spatial models and make recommendations regarding application.
The final chapter discusses future research ideas which have been motivated by this thesis.
This type of interdisciplinary work is a hallmark of our program in Applied Statistics at Simon Fraser University. For more information, please contact Laurie Ainsworth (firstname.lastname@example.org) or her supervisor Charmaine Dean (email@example.com), Department of Statistics and Actuarial Science.