Diff report
General
| Version 10 | Version 11 | |
|---|---|---|
| Variant | main - en | main - en |
| Document Name | Research Thrust | (not changed) |
| Creation time | 7/18/06 3:10:41 PM | 7/18/06 3:16:54 PM |
| Created by | Karen(Lujie) Chen | Karen(Lujie) Chen |
| State | publish | publish |
Changes to parts
Part Content has changed
| Version 10 | Version 11 | |
|---|---|---|
| Mime type | text/xml | (not changed) |
| File name | (not changed) | |
| Size (bytes) | 7620 | 249 |
Content diff
<html>
<body>
<ul>
<li>
<p>Rapid Detection of Emerging Pattern</p>
</li>
<li>
<p>Massive Data Mining</p>
</li>
<li>
<p>Social Network Analysis/Link Analysis/Group Detection</p>
</li>
<li>
<p>Life Science Data Mining</p>
</li>
</ul>
<h2>Rapid Detection of Emerging Pattern</h2>
<p><tt><tt>Data mining algorithms at Auton Lab have successfully detected new
emerging patterns in various domains: Health services, Agriculture,
Manufacturing and Oil companies. Our algorithms are 10-1000
times</tt></tt><tt><tt> faster than other traditional techniques. The results
demonstrate significantly higher detection power with much smaller false
positive rates. We have applied these algorithms in semi/fully-automated modes
under supervied/unsupervised environments and for retrospective/prospective
surveillance. A few algorithms for Rapid detection of emerging patterns are:
WSARE, Ultra Fast SSS, and TipMon.</tt></tt></p>
<h2>Massive Data Mining</h2>
<p><tt>The Auton Lab has over 10 years of experience with data mining on massive
data streams. We have expertise with both established techniques and in the
development of new algorithms to provide robust and efficient solutions for
massive data sets. Our work has previously addressed problems in range of
fields, including: bio-survelience, large-scale astronomy, the intelligence
community, robotics, life sciences, and a variety of industrial applications.
This work include both a large number of successful software deployments and a
range available general purpose software.</tt></p>
<p><tt>Our work in massive scale data mining allows users to tractably process
large data sets, addressing such problem as:</tt></p>
<ul>
<li><tt>Discovering (previously unknown) structure or patterns in the data -
What can we say about the underlying structure of the data? Our work on this
problem focuses on learning underlying probabilistic models. In particular, we
have significant experience in efficiently learning large Bayesian networks,
which provide a powerful and readable description of the underlying model.</tt>
</li>
<li><tt>Finding anomalous or interesting data points buried within the data -
Given a large set of data points, can we identify any as anomalous? Our work on
this problem has been used to find new, interesting objects in such data sets as
the Sloan Digital Sky Survey.</tt></li>
<li><tt>Accurately classifying new data points - Can we accurately classify a
new observation given a historical set of data points? Our work on this problem
has touched a variety of applications and includes developing new more efficient
methods for such techniques as nearest neighbor classification and logistic
regression.</tt></li>
<li><tt>Intelligently choosing the best action to perform - Given a noisy view
of the current world state, how do we best choose the next action to perform?
Our work on this problem includes both traditional questions in robotics and the
question of active learning. Active learning asks how we should next sample the
data point so as to get the most useful information, allowing us to minimize the
number of potentially expensive experiments.</tt></li>
</ul>
<p><tt>Our primary specialty is in developing novel ways to exploit structure
within both the data and the problem itself to make our approaches significantly
faster. In particular, we have developed a range of efficient data structures
and search algorithms that effectively target the algorithms, focusing the
computation on the important aspects of the problem. Thus our work enables
experts in other fields to accurately and tractably mine massive data streams in
their area of interest.</tt></p>
<h2>Social Network Analysis/Link Analysis/Group Detection</h2>
<p><tt>Social Network Analysis/Link Analysis/Group Detection seek to discover
interesting relationships and patterns among people or other entities, for
example:</tt></p>
<ul>
<li><tt>Who communicates with whom? And who appears to avoid communicating with
whom?</tt></li>
<li><tt>Are there cliques of people who mostly communicate among themselves and
rarely with others, or is communication more evenly distributed?</tt></li>
<li><tt>Are there "stars" who are linked with a very large number or people,
and/or isolated people who are only linked with one or two others?</tt></li>
<li><tt>Might there be aliases? That is, if we see two people with essentially
the same link patterns, but who are never linked with each other, might they in
fact be the same person?</tt></li>
<li><tt><tt>How do patterns of association among entities evolve over
time?</tt></tt></li>
<li><tt>C<tt>an we identify groups of entities, based on link data and/or
demographic properties?</tt><tt>If we know that a communication took place, but
we don't know the identity of one of the participants, can we infer who that
entity was?</tt></tt></li>
</ul>
<p><tt>Auton Lab researchers have developed--and continue to develop--many
algorithms and associated software packages for investigating these kinds of
questions.As usual at the Auton Lab, these technologies place great emphasis on
efficient analysis of large datasets. This is a list of representative softwares
in this thrust</tt></p>
<p>
<tt><a href="http://www.autonlab.org/autonweb/software/10416.html?branch=1&language=2">AFDL</a>
- Activity From Demographics and Links <br/>
<a href="http://www.autonlab.org/autonweb/software/10385.html"> Bayes Net
Learner<br/>
SBNS</a> - Screen-based Bayes Net Structure
search</tt><tt> <br/>
<a href="http://www.autonlab.org/autonweb/software/10506.html">GDA/k-groups</a>
- Group Detection Algorithm<br/>
<a href="http://www.autonlab.org/autonweb/software/10514.html">MNOP</a> - Many
Names, One Person alias detection <br/>
<a href="http://www.autonlab.org/autonweb/software/10542.html"> XGDA</a> - A
fast group detection algorithm </tt></p>
<h2>Life Science Data Mining</h2>
<p><tt>Life sciences is a collective term encompassing biochemistry,genetics,
ecology, pharmacology, medicine, and many other sciences concerned with living
organisms. The Auton Lab has diverse experience in data mining applications for
these disciplines, from core areas like drug discovery and drug classification,
to big-picture problems in epidemiology and pathogen detection. </tt></p>
<p><tt>Medicinal drugs are typically created through a process similar to
Edison's work on the light-bulb: very smart scientists think very hard about the
desired effect of a drug, then work very hard to limit their ideas to those few
they can afford to carefully test. An alternative methodology is High
Throughput Screening (HTS), where truly enormous libraries of drug candidates
are tested for efficacy in robotic chemistry labs. Modern HTS labs might make
only 1 mistake in 1,000 experiments, but this leads to hundreds of mistakes on a
small HTS library -- roughly the same order of magnitude as the number of useful
chemicals in the library.</tt></p>
<p><tt>Detecting mistakes in HTS data can save hundreds or thousands of hours of
expensive wet lab time, as well as recover wrongly-disqualified candidates for
further testing. This is a job for fast, robust, and correct statistics. When
traditional statistical software packages failed to scale-up to the demands, the
Auton Lab developed new algorithms that met the challenge. Beyond the research,
the Auton Lab delivered custom software libraries and user interfaces to our
collaborators, to help them make use of our algorithmic innovations.</tt></p>
</body>
</html>