autonlab.org
WARNING: you are not looking at the live version but at an older version.

High-Dimensional Probabilistic Classification for Drug Discovery (2004)

Alexander Gray Paul Komarek Ting Liu Andrew Moore

Tags

Applications, Efficient Statistical Algorithms, Optimization

Abstract

Automated high-throughput drug screening constitutes a critical emerging approach in modern pharmaceutical research. The statistical task of interest is that of discriminating active versus inactive molecules given a target molecule, in order to rank potential drug candidates for further testing. Because the core problem is one of ranking, our approach concentrates on accurate estimation of unknown class probabilities, in contrast to popular non-probabilistic methods which simply estimate decision boundaries. While this motivates nonparametric density estimation, we are faced with the fact that the molecular descriptors used in practice typically contain thousands of binary features. In this paper we attempt to improve the extent to which kernel density estimation can work well in high-dimensional classification settings. We present a synthesis of techniques (SLAMDUNK: Sphere, Learn A Metric, Discriminate Using Nonisotropic Kernels) which yields favorable performance in comparison to previous published approaches to drug screening, as tested on a large proprietary pharmaceutical dataset.

Full text

Download (application/pdf, 184.3 kB)

Approximate BibTeX Entry

@inproceedings{Gray:compstat2004,
    Year = {2004},
    Booktitle = {Proceedings of the Computational Statistics},
    Editor = {J. Antoch et al.},
    Author = {Alexander Gray Paul Komarek Ting Liu Andrew Moore},
    Title = {High-Dimensional Probabilistic Classification for Drug Discovery}
}

Copyright 2010, Carnegie Mellon University, Auton Lab. All Rights Reserved.