autonlab.org

WSARE

What is WSARE?

WSARE tries to answer the question "What's Strange About Recent Events."  The algorithm looks at data containing discrete observations of a set of attributes over time, for example emergency room visit logs.  By comparing today's data against baseline values from previous days, the algorithm tries to decide if anything 'strange' is happening.
WSARE finds 'strange' activity by forming rules consisting of different subsets of the attributes in the data.  From the hospital log example, a rule could be of the form "Gender = Male AND Home Location = NW" (ie records corresponding to males living in the NW geographic area).  It then compares the number of records that fit a certain rule today to the number of records that fit that rule in the baseline (past) data.  WSARE considers all possible rules and reports back any that had a statistically significant change in the number of records fitting that rule.  These are the 'strange' rules, or anomalies, and might be worthy of further investigation.

What's special about WSARE?

The advantage that the WSARE algorithms have over conventional methods is their ability to identify the combination of attributes that characterize the most anomalous groups in the data rather than relying on a user to specify beforehand which combination of attributes to monitor. WSARE 3.0 has a further advantage in its ability to account for trends over time when producing the baseline distribution while WSARE 2.0 can be thrown off by these trends when it uses raw historical data for the baseline.

What types of problems can be solved with WSARE?

WSARE can find anomalous patterns in multivariate categorical data sets.  In particular, WSARE can determine if a group of records in the data, having a specific combination of attributes, has an increase or decrease from its "usual" pattern, where "usual" is inferred from the defined baseline.

WSARE in action

Winter Olympics

The 2002 Winter Olympics were held in Salt Lake City from February 8 to March 16, 2002. Following the terrorist attacks on September 11, 2001, and the anthrax release in October 2001, the need for bioterrorism surveillance during the Games was paramount. WSARE was deployed as part of an early warning system to help to detect likely bioterrorism threats and disease outbreaks. The case features analyzed by WSARE included syndrome category, age, gender, and geographical information. The current count of patients with specific features was compared with the counts on the same day of the week during recent weeks.  For details refer to this article.

Monitoring public health records in state of Pennsylvania hospitals

The algorithm was also used experimentally to monitor public health records in the state of Pennsylvania where performance of WSARE 3.0 on actual Emergency Department data from a major US city was evaluated. This database contains almost seven years worth of data attributes including date of admission, coded hospital ID, age decile, gender, syndrome information, discretized home latitude, discretized home longitude, discretized work latitude, discretized work longitude and both home location and work location on a coarse latitude-longitude grid.  WSARE operates on data from the year 2001 and is allowed to use over five full years worth of training data from the start of 1996 to the current day. The environmental attributes used are month, day of week, and the number of cases from the previous day with respiratory problems. The last environmental attribute is intended to be an approximation to the flu levels in the city.  WSARE identified anomalous patterns with sub-groups of records having symptoms involving dizziness, fever and sore throat.  This was suspected to be related to an influenza strain that winter.

Retrospective evaluation in Israel influenza type B outbreak and Walterton outbreak

The Israel Center for Disease Control evaluated WSARE retrospectively using an unusual outbreak of influenza type B that occurred in an elementary school in central Israel.  WSARE was applied to patient visits to community clinics between the dates of May 24, 2004 to June 11, 2004. The attributes in this data set include the visit date, area code, ICD-9 code, age category, and day of week. The day of week was used as the only environmental attribute. WSARE detected the outbreak on the second day from its onset.  The pattern that WSARE found consisted of three attributes: ICD-9 code, area code, and age category.  It suggested that an anomalous pattern was found involving children aged 6-14 having viral symptoms within a specific geographic area. In another retrospective analysis, the Walkerton outbreak, WSARE was able to detect the outbreak one day before a boil-water advisory was released if its alarm threshold was set to a level that permitted two false positives per year.  For the Israel case details refer to this report.

Representative Publication

What's Strange About Recent Events (WSARE): An Algorithm for the Early Detection of Disease Outbreaks (2005)
Efficient Analytics for Effective Monitoring of Biomedical Security (2005)
Thesis:Data mining for early disease outbreak detection (2004)
Bayesian Network Anomaly Pattern Detection for Disease Outbreaks
(2003)
Rule-based Anomaly Pattern Detection for Detecting Disease Outbreaks (2002)

Software and Sample Data

WSARE software
WSARE sample dataset

It is exciting to see WSARE applied to data from domains other than bio-surveillance. If you think you have data and a problem that WSARE is designed to solve, feel free to try the software out and let us know if you encouter any problems or if you have any new discovery to share with us.

Copyright 2008, Carnegie Mellon University, Auton Lab. All Rights Reserved.