Research Thrust
Rapid Detection of Emerging Pattern
Massive Data Mining
Social Network Analysis/Link Analysis/Group Detection
Life Science Data Mining
Rapid Detection of Emerging Pattern
Data mining algorithms at Auton Lab have successfully detected new emerging patterns in various domains: Health services, Agriculture, Manufacturing and Oil companies. Our algorithms are 10-1000 times faster than other traditional techniques. The results demonstrate significantly higher detection power with much smaller false positive rates. We have applied these algorithms in semi/fully-automated modes under supervied/unsupervised environments and for retrospective/prospective surveillance. A few algorithms for Rapid detection of emerging patterns are: WSARE, Ultra Fast SSS, and TipMon.
Massive Data Mining
The Auton Lab has over 10 years of experience with data mining on massive data streams. We have expertise with both established techniques and in the development of new algorithms to provide robust and efficient solutions for massive data sets. Our work has previously addressed problems in range of fields, including: bio-survelience, large-scale astronomy, the intelligence community, robotics, life sciences, and a variety of industrial applications. This work include both a large number of successful software deployments and a range available general purpose software.
Our work in massive scale data mining allows users to tractably process large data sets, addressing such problem as:
- Discovering (previously unknown) structure or patterns in the data - What can we say about the underlying structure of the data? Our work on this problem focuses on learning underlying probabilistic models. In particular, we have significant experience in efficiently learning large Bayesian networks, which provide a powerful and readable description of the underlying model.
- Finding anomalous or interesting data points buried within the data - Given a large set of data points, can we identify any as anomalous? Our work on this problem has been used to find new, interesting objects in such data sets as the Sloan Digital Sky Survey.
- Accurately classifying new data points - Can we accurately classify a new observation given a historical set of data points? Our work on this problem has touched a variety of applications and includes developing new more efficient methods for such techniques as nearest neighbor classification and logistic regression.
- Intelligently choosing the best action to perform - Given a noisy view of the current world state, how do we best choose the next action to perform? Our work on this problem includes both traditional questions in robotics and the question of active learning. Active learning asks how we should next sample the data point so as to get the most useful information, allowing us to minimize the number of potentially expensive experiments.
Our primary specialty is in developing novel ways to exploit structure within both the data and the problem itself to make our approaches significantly faster. In particular, we have developed a range of efficient data structures and search algorithms that effectively target the algorithms, focusing the computation on the important aspects of the problem. Thus our work enables experts in other fields to accurately and tractably mine massive data streams in their area of interest.