Fast Spatial Scan
What is Fast Spatial Scan?
Fast Spatial Scan makes the automatic detection of anomalous spatial clusters very efficiently and effectively. Given a massive set of spatial or space-time count data (e.g. the number of reported disease cases in each zip code on each day), it searches through the dataset to find spatial regions with higher than expected counts. This process has two steps: first it infers the expected count for each spatial location using time series analysis, then uses an expectation-based spatial scan statistic approach to find spatial regions where the counts are significantly higher than expected. Randomization testing is performed to compute the statistical significance of each discovered cluster, enabling us to distinguish true clusters from those due to chance.
What's special about Fast Spatial Scan?
Spatial scan statistics are a powerful statistical test for detection of significant spatial clusters. Unlike many other cluster detection methods, they can be used both to determine whether any statistically significant clusters exist and to precisely pinpoint the size and location of clusters. Because the statistic scans over a huge number of regions of variable shape and size (and each region can contain between one and many locations), it has high power to detect clusters regardless of whether they affect a small or large spatial area. Our statistical test correctly adjusts for the multiplicity of tests performed, enabling us to ensure a low false positive rate while maintaining high power to detect any significant clusters that do occur.
Our new implementation of spatial scan statistics has several advantages over standard spatial scan approaches (e.g. SaTScan). First, we use novel spatial statistical methods to adjust for spatial and temporal variation in the baseline counts, allowing us to account correctly for day of week, seasonality, and other trends. This improves detection power, allowing more timely detection of emerging clusters with fewer false positives. Second, we have developed a new computational method, the “fast spatial scan.” This fast multi-resolution search approach allows us to compute the spatial scan hundreds to thousands of times faster than the standard approach. Thus we can obtain results in minutes rather than hours or days, even for massive datasets containing millions of records.
Speed Results:
What type of problem can be solved by Fast Spatial Scan?
Spatial Scan can find anomalous spatial clusters in spatial or space-time data sets. In particular, given a large set of spatial locations (e.g. zip codes), where each location has an associated time series of counts, it can detect any spatial regions where the most recent counts are significantly higher than expected, given the historical baseline data. For example, if we are given the number of emergency department visits in each zip code on each day, it can find areas where the recent number of cases is abnormally high, which may be indicative of an emerging outbreak of disease.
Fast Spatial Scan in action
Retrospective analysis of Walkerton outbreak
In May 2000, an outbreak of gastroenteritis in Walkerton, Ontario resulted from contamination of the water supply with E. coli bacteria. Over 2000 individuals were affected by severe gastrointestinal symptoms, including 65 hospitalizations and 6 deaths. We used the fast spatial scan software to perform a retrospective analysis of emergency department visits in Walkerton and the surrounding Grey-Bruce region of Ontario between 1999 and 2001. At a rate of only two false positives per year, the software was able to detect the outbreak on May 19, 2000, two days before the first public health response and one day before the other surveillance methods tested.
Nationwide monitoring of over-the-counter drug sales
We are currently using fast spatial scan tool to perform daily monitoring of over-the-counter medication sales from the National Retail Data Monitor (NRDM). Our system receives daily counts of the number of units sold in 18 different product categories (cough remedies, nasal decongestants, etc.) from over 20,000 retail stores and pharmacies nationwide. It then uses our new spatial cluster detection methods to find areas where the sales are significantly higher than expected, and makes these results available to state and local public health officials via a web-based graphical interface.
Representative Publications
Methods for detecting spatial and spatio-temporal clusters. In M. Wagner,
A. Moore, and R. Aryel, eds., Handbook of Biosurveillance, 2006
Efficient scan statistic computations. In A. Lawson and K. Kleinman, eds.,
Spatial and Syndromic Surveillance for Public Health, 2005
A Bayesian spatial scan statistic. In Advances in Neural Information Processing
Systems 18, 2006, in press
A Bayesian scan statistic for spatial cluster detection. Proceedings of the
National Syndromic Surveillance Conference, 2005. Received “Best Research
Presentation” award
Detection of
emerging space-time clusters (KDD 2005)
Anomalous spatial cluster
detection (KDD 2005)
Detecting anomalous
patterns in pharmacy retail data (KDD 2005)
Detecting significant
multidimensional spatial clusters (NIPS 2004)
Rapid detection of
significant spatial clusters (KDD 2004)
