Fast Spatial Scan makes the automatic detection of anomalous spatial clusters very efficient and effective. Given a massive set of spatial or space-time count data (e.g. the number of reported disease cases in each zip code on each day), it searches through the dataset to find spatial regions with higher than expected counts. This process has two steps: first it infers the expected count for each spatial location using time series analysis, then it uses an expectation-based spatial scan statistic approach to find spatial regions where the counts are significantly higher than expected. Randomization testing is performed to compute the statistical significance of each discovered cluster, enabling us to distinguish true clusters from those due to chance.
Spatial scan statistics are a powerful statistical test for detection of significant spatial clusters. Unlike many other cluster detection methods, they can be used both to determine whether any statistically significant clusters exist and to precisely pinpoint the size and location of clusters. Because the statistic scans over a huge number of regions of variable shape and size (and each region can contain between one and many locations), it has high power to detect clusters regardless of whether they affect a small or large spatial area. Our statistical test correctly adjusts for the multiplicity of tests performed, enabling us to ensure a low false positive rate while maintaining high power to detect any significant clusters that do occur.
Our new implementation of spatial scan statistics has several advantages over standard spatial scan approaches (e.g. SaTScan). First, we use novel spatial statistical methods to adjust for spatial and temporal variation in the baseline counts, allowing us to account correctly for day of week, seasonality, and other trends. This improves detection power, allowing more timely detection of emerging clusters with fewer false positives. Second, we have developed a new computational method, the “fast spatial scan.” This fast multi-resolution search approach allows us to compute the spatial scan hundreds to thousands of times faster than the standard approach. Thus we can obtain results in minutes rather than hours or days, even for massive datasets containing millions of records.
Speed/Performance Results (click thumbnail below to view big picture):
Spatial Scan can find anomalous spatial clusters in spatial or space-time data sets. In particular, given a large set of spatial locations (e.g. zip codes), where each location has an associated time series of counts, it can detect any spatial regions where the most recent counts are significantly higher than expected, given the historical baseline data. For example, if we are given the number of emergency department visits in each zip code on each day, it can find areas where the recent number of cases is abnormally high, which may be indicative of an emerging outbreak of disease.
In May 2000, an outbreak of gastroenteritis in Walkerton, Ontario resulted from contamination of the water supply with E. coli bacteria. Over 2000 individuals were affected by severe gastrointestinal symptoms, including 65 hospitalizations and 6 deaths. We used the fast spatial scan software to perform a retrospective analysis of emergency department visits in Walkerton and the surrounding Grey-Bruce region of Ontario between 1999 and 2001. At a rate of only two false positives per year, the software was able to detect the outbreak on May 19, 2000, two days before the first public health response and one day before the other surveillance methods tested.
We are currently using the fast spatial scan tool to perform daily monitoring of over-the-counter medication sales from the National Retail Data Monitor (NRDM). Our system receives daily counts of the number of units sold in 18 different product categories (cough remedies, nasal decongestants, etc.) from over 20,000 retail stores and pharmacies nationwide. It then uses our new spatial cluster detection methods to find areas where the sales are significantly higher than expected, and makes these results available to state and local public health officials via a web-based graphical interface.
Interfaces of National Retail Data Monitor(click thumbnail below to view big picture),see also 2005 KDD paper for details.
Methods for detecting spatial and spatio-temporal
clusters. In M. Wagner, A. Moore, and R. Aryel, eds., Handbook of
Efficient scan statistic computations. In A. Lawson and K. Kleinman, eds., Spatial and Syndromic Surveillance for Public Health, 2005
A Bayesian spatial scan statistic (NIPS 2005)
A Bayesian scan statistic for spatial cluster detection. (National Syndromic Surveillance Conference 2005)
Detection of emerging space-time clusters (KDD 2005)
Anomalous spatial cluster detection (KDD 2005)
Detecting anomalous patterns in pharmacy retail data (KDD 2005)
Detecting significant multidimensional spatial clusters (NIPS 2004)
Rapid detection of significant spatial clusters (KDD 2004)