Very Fast EM-based Mixture Model Clustering Using Multiresolution KD-trees (1999)
Tags
Bayesian Networks, Clustering, Kd-trees and Ball-trees, Mixture Models, Statistical Data Mining for Astrophysics
Abstract
Clustering is important in many fields including manufacturing, biology, finance, and astronomy. Mixture models are a popular approach due to their statistical foundations, and EM is a very popular method for finding mixture models. EM, however, requires many accesses of the data, and thus has been dismissed as impractical (e.g. (Zhang, Ramakrishnan, & Livny, 1996)) for data mining of enormous datasets. We present a new algorithm, based on the multiresolution kd-trees of (Moore, Schneider, & Deng, 1997), which dramatically reduces the cost of EM-based clustering, with savings rising linearly with the number of datapoints. Although presented here for maximum likelihood estimation of Gaussian mixture models, it is also applicable to nonGaussian models (provided class densities are monotonic in Mahalanobis distance), mixed categorical/numeric clusters, and Bayesian methods such as Autoclass (Cheeseman & Oldford, 1994).
Full text
Download (application/pdf, 297.6 kB)
Approximate BibTeX Entry
@inproceedings{moore-veryfast,
Month = {April},
Year = {1999},
Pages = {543--549},
Publisher = {Morgan Kaufman},
Address = {340 Pine Street, 6th Fl., San Francisco, CA 94104},
Booktitle = {Advances in Neural Information Processing Systems},
Editor = {M. Kearns and D. Cohn},
Author = {Andrew Moore},
Title = {Very Fast EM-based Mixture Model Clustering Using Multiresolution KD-trees}
}