autonlab.org

N-Body Problems in Statistical Learning (2001)

Alexander Gray, Andrew Moore

Tags

Astrostatistics, Cached Sufficient Statistics, Efficient Statistical Algorithms, Kd-trees and Ball-trees, Memory-based Learning, Spatial Statistics, Statistical Data Mining for Astrophysics

Abstract

We present efficient algorithms for all-point-pairs problems, or 'N-body'-like problems, which are ubiquitous in statistical learning. We focus on six examples, including nearest-neighbor classification, kernel density estimation, outlier detection, and the two-point correlation. These include any problem which abstractly requires a comparison of each of the N points in a dataset with each other point and would naively be solved using N^2 distance computations. In practice N is often large enough to make this infeasible. We present a suite of new geometric techniques which are applicable in principle to any `N-body' computation including large-scale mixtures of Gaussians, RBF neural networks, and HMM's. Our algorithms exhibit favorable asymptotic scaling and are empirically several orders of magnitude faster than the naive computation, even for small datasets. We are aware of no exact algorithms for these problems which are more efficient either empirically or theoretically. In addition, our framework yields simple and elegant algorithms. It also permits two important generalizations beyond the standard all-point-pairs problems, which are more difficult. These are represented by our final examples, the multiple two-point correlation and the notorious n-point correlation.

Full text

Download (application/pdf, 272.5 kB)

Approximate BibTeX Entry

@inproceedings{gray-nbody,
    Year = {2001},
    Publisher = {MIT Press},
    Booktitle = {Advances in Neural Information Processing Systems},
    Editor = {Leen, Todd K. and Dietterich, Thomas G.},
    Author = { Alexander Gray, Andrew Moore },
    Title = {N-Body Problems in Statistical Learning}
}

Copyright 2010, Carnegie Mellon University, Auton Lab. All Rights Reserved.