autonlab.org
WARNING: you are not looking at the live version but at an older version.

Fast Spatial Scan Statistics

*What is SSS?

Our Spatial Scan Statistics (SSS) system enables the automatic detection of anomalous spatial clusters. Given a massive set of spatial or space-time count data (e.g. the number of reported disease cases in each zip code on each day), SSS searches through the dataset to find spatial regions with higher than expected counts. This process has two steps: SSS first infers the expected count for each spatial location using time series analysis, then uses an expectation-based spatial scan statistic approach to find spatial regions where the counts are significantly higher than expected. Randomization testing is performed to compute the statistical significance of each discovered cluster, enabling us to distinguish true clusters from those due to chance.

*What's special about SSS?

Spatial scan statistics are a powerful statistical test for detection of significant spatial clusters. Unlike many other cluster detection methods, they can be used both to determine whether any statistically significant clusters exist and to precisely pinpoint the size and location of clusters. Because the statistic scans over a huge number of regions of variable shape and size (and each region can contain between one and many locations), it has high power to detect clusters regardless of whether they affect a small or large spatial area. Our statistical test correctly adjusts for the multiplicity of tests performed, enabling us to ensure a low false positive rate while maintaining high power to detect any significant clusters that do occur.

Our new implementation of spatial scan statistics has several advantages over standard spatial scan approaches (e.g. SaTScan). First, we use novel spatial statistical methods to adjust for spatial and temporal variation in the baseline counts, allowing us to account correctly for day of week, seasonality, and other trends. This improves detection power, allowing more timely detection of emerging clusters with fewer false positives. Second, we have developed a new computational method, the “fast spatial scan.” This fast multi-resolution search approach allows us to compute the spatial scan hundreds to thousands of times faster than the standard approach. Thus we can obtain results in minutes rather than hours or days, even for massive datasets containing millions of records.

*What type of problem can be solved by SSS?

SSS can find anomalous spatial clusters in spatial or space-time data sets. In particular, given a large set of spatial locations (e.g. zip codes), where each location has an associated time series of counts, SSS can detect any spatial regions where the most recent counts are significantly higher than expected, given the historical baseline data. For example, if we are given the number of emergency department visits in each zip code on each day, SSS can find areas where the recent number of cases is abnormally high, which may be indicative of an emerging outbreak of disease.

*SSS in action

<Retrospective analysis of Walkerton outbreak>

In May 2000, an outbreak of gastroenteritis in Walkerton, Ontario resulted from contamination of the water supply with E. coli bacteria. Over 2000 individuals were affected by severe gastrointestinal symptoms, including 65 hospitalizations and 6 deaths. We used the SSS software to perform a retrospective analysis of emergency department visits in Walkerton and the surrounding Grey-Bruce region of Ontario between 1999 and 2001. At a rate of only two false positives per year, SSS was able to detect the outbreak on May 19, 2000, two days before the first public health response and one day before the other surveillance methods tested. 

<Nationwide monitoring of over-the-counter drug sales>

We are currently using SSS to perform daily monitoring of over-the-counter medication sales from the National Retail Data Monitor (NRDM). Our system receives daily counts of the number of units sold in 18 different product categories (cough remedies, nasal decongestants, etc.) from over 20,000 retail stores and pharmacies nationwide. It then uses our new spatial cluster detection methods to find areas where the sales are significantly higher than expected, and makes these results available to state and local public health officials via a web-based graphical interface.

*Links to representative publications

1. D.B. Neill and A.W. Moore. Methods for detecting spatial and spatio-temporal clusters. In M. Wagner, A. Moore, and R. Aryel, eds., Handbook of Biosurveillance, 2006.

2. D.B. Neill and A.W. Moore. Efficient scan statistic computations. In A. Lawson and K. Kleinman, eds., Spatial and Syndromic Surveillance for Public Health, 2005.

3. D.B. Neill, A.W. Moore, and G.F. Cooper. A Bayesian spatial scan statistic. In Advances in Neural Information Processing Systems 18, 2006, in press.

4. D.B. Neill, A.W. Moore, and G.F. Cooper. A Bayesian scan statistic for spatial cluster detection. Proceedings of the National Syndromic Surveillance Conference, 2005. Received “Best Research Presentation” award.

5. D.B. Neill, A.W. Moore, M.R. Sabhnani, and K. Daniel. Detection of emerging space-time clusters. Proceedings of the 11th ACM SIGKDD Conference on Knowledge Discovery and Data Mining}, 218-227, 2005.

6. D.B. Neill and A.W. Moore. Anomalous spatial cluster detection. Proceedings of the KDD 2005 Workshop on Data Mining Methods for Anomaly Detection, 2005.

7. M.R. Sabhnani, D.B. Neill, A.W. Moore, F.-C. Tsui, M.M. Wagner, and J.U. Espino. Detecting anomalous patterns in pharmacy retail data. Proceedings of the KDD Workshop on Data Mining Methods for Anomaly Detection, 2005.

8. D.B. Neill, A.W. Moore, F. Pereira, and T. Mitchell. Detecting significant multidimensional spatial clusters. In L.K. Saul, et al., eds., Advances in Neural Information Processing Systems 17, 969-976, 2005.

9. D.B. Neill and A.W. Moore. Rapid detection of significant spatial clusters. Proceedings of the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 256-265, 2004.

Copyright 2008, Carnegie Mellon University, Auton Lab. All Rights Reserved.