The Anchors Hierarchy: Using the Triangle Inequality to Survive High-Dimensional Data (2000)

Andrew Moore


Cached Sufficient Statistics, Clustering, Efficient Statistical Algorithms, Kd-trees and Ball-trees, Mixture Models, Statistical Data Mining for Astrophysics


This paper is about metric data structures in high-dimensional or non-Euclidean space that permit cached sufficient statistics accelerations of learning algorithms. It has recently been shown that for less than about 10 dimensions, decorating kd- trees with additional "cached sufficient statistics " such as first and second moments and contingency tables can provide satisfying acceleration for a very wide range of statistical learning tasks such as kernel regression, locally weighted regression, k-means clustering, mixture modeling and Bayes Net learning. In this paper, we begin by defining the anchors hierarchy---a fast data structure and algorithm for localizing data based only on a triangle-inequality-obeying distance metric. We show how this, in its own right, gives a fast and effective clustering of data. But more importantly we show how it can produce a well-balanced structure similar to a Ball-Tree (Omohundro, 1991) or a kind of metric tree (Uhlmann, 1991; Ciaccia, Patella, & Zezula, 1997) in a way that is neither "top-down" nor "bottom-up" but instead "middle-out". We then show how this structure, decorated with cached sufficient statistics, allows a wide variety or statistical learning algorithms to be accelerated even in thousands of dimensions.

Full text

Download (application/pdf, 203.2 kB)

Approximate BibTeX Entry

    Year = {2000},
    Pages = {397-405},
    Publisher = {AAAI Press},
    Booktitle = {Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence},
    Author = {Andrew Moore},
    Title = {The Anchors Hierarchy: Using the Triangle Inequality to Survive High-Dimensional Data}

Copyright 2010, Carnegie Mellon University, Auton Lab. All Rights Reserved.