autonlab.org
WARNING: you are not looking at the live version but at an older version.

Research Thrust

Life Science Data Mining

Life sciences is a collective term encompassing biochemistry, genetics, ecology, pharmacology, medicine, and many other sciences concerned with living organisms.  The Auton Lab has diverse experience in data mining applications for these disciplines, from core areas like drug discovery and drug classification, to big-picture problems in epidemiology and pathogen detection.

Medicinal drugs are typically created through a process similar to Edison's work on the light-bulb: very smart scientists think very hard about the desired effect of a drug, then work very hard to limit their ideas to those few they can afford to carefully test.  An alternative methodology is High Throughput Screening (HTS), where truly enormous libraries of drug candidates are tested for efficacy in robotic chemistry labs.  Modern HTS labs might make only 1 mistake in 1,000 experiments, but this leads to hundreds of mistakes on a small HTS library -- roughly the same order of magnitude as the number of useful chemicals in the library.

Detecting mistakes in HTS data can save hundreds or thousands of hours of expensive wet lab time, as well as recover wrongly-disqualified candidates for further testing.  This is a job for fast, robust, and correct statistics.  When traditional statistical software packages failed to scale-up to the demands, the Auton Lab developed new algorithms that met the challenge.  Beyond the research, the Auton Lab delivered custom software libraries and user interfaces to our collaborators, to help them make use of our algorithmic innovations.

Paper

NameSummaryActions
Fast Robust Logistic Regression for Large Sparse Datasets with Binary Outputs Although popular and extremely well established in mainstream statistical data analysis, logistic regression is strangely absent in the field of data mining. There are two possible explanations of this phenomenon. First, there might be an assumption that any tool which can only produce linear cl...show
High-Dimensional Probabilistic Classification for Drug Discovery Automated high-throughput drug screening constitutes a critical emerging approach in modern pharmaceutical research. The statistical task of interest is that of discriminating active versus inactive molecules given a target molecule, in order to rank potential drug candidates for further testing...show

Back

Copyright 2008, Carnegie Mellon University, Auton Lab. All Rights Reserved.