Alexander Gray
Biography
Alex's fascinations in early grade school were Legos, breaking ciphers, and drawing human anatomy. After studying Applied Math and Computer Science at Berkeley, he resisted a job offer to do Hollywood special effects and ended up working at NASA's Jet Propulsion Laboratory for six years developing machine learning algorithms for interesting and hard scientific problems (as well as trading options on the side). He finally realized that having non-trivial ideas is effectively not allowed without having a PhD, so he went to CMU to get one. His current fascinations are still building (systems that really solve hard problems that people really want solved), deciphering (things that seem complicated), and creating (new and inspiring ways of looking at things).
Research Interests
Large-scale learning algorithms. Unsupervised learning. Time series and control. Automatic derivation of parametric learning algorithms. Nonparametric methods. Recursive statistical models. Data Structures. Fundamental extensions of divide-and-conquer. Computational geometry. Challenge problems of numerical analysis and operations research.
Tags
Astrostatistics, Auton Fast Classifiers, Bayesian Networks, Cached Sufficient Statistics, Clustering, Efficient Statistical Algorithms, Kd-trees and Ball-trees, Kernel Density Estimation, K Nearest Neighbor, Life Science Data Mining, Locally Weighted Learning, Memory-based Learning, Mixture Models, Optimization, Statistical Data Mining for Astrophysics
Papers
-
An Investigation of Practical Approximate Nearest Neighbor Algorithms
(2004)
How to use variations on classic exact data structures for nearest neighbor, if you want to get faster answers and are prepared to accept approximation? -
High-Dimensional Probabilistic Classification for Drug Discovery
(2004)
Discriminative probabilistic classifiers have been used successfully on large life-sciences datasets, but high dimensionalities have prohibited the use of nonparametric class probability estimation. This paper explores a method (SLAMDUNK) which addresses -
Rapid Evaluation of Multiple Density Models
(2003)
A way to quickly evaluate and compare multiple nonparametric density estimates. -
Efficient Exact k-NN and Nonparametric Classification in High Dimensions
(2003)
Can we do non-approximate k-NN classification without actually finding the k-NN? -
N-Body Problems in Statistical Learning
(2001)
A way to use multiple trees simultaneously to solve a large class of statistical problems efficiently.
Software
-
npt
N-point Spatial Statistics. -
Cuevas CFF Clustering
Cuevas uses the 2-step CFF algorithm to perform clustering against a noisy background.