autonlab.org

Learning Filaments (2000)

Geoff Gordon, Andrew Moore

Tags

Applications, Astrostatistics, Cached Sufficient Statistics, Clustering, Efficient Statistical Algorithms, Kd-trees and Ball-trees, Mixture Models, Statistical Data Mining for Astrophysics

Abstract

This paper is about new statistics and new efficient algorithms for a form of mixture model that learns filamentary structures. Such models are important in several areas of scientific data analysis, but in this paper our main example is identification of large-scale structure among galaxies. We describe software which can extract the positions of spherical and line-shaped clusters from data about the locations of objects such as galaxies. We do so by fitting a particular type of Gaussian mixture model to the galaxy locations. The most interesting feature of our model is that it directly represents line segments in the distribution, unlike standard Gaussian mixture models which can only handle ellipses. Because we fit the line segments directly, we do not need to do any post-processing to extract their locations. We use a modification of the k-means algorithm to find model parameters. Since our software needs to deal with large data sets, it is important to accelerate model-fitting as much as possible. So, we store the galaxy locations in a multi-resolution kd-tree, and we introduce new pruning algorithms that allow us to skip over large parts of the tree in each k-means step. We provide evaluations on both synthetic and real data sets.

Full text

Download (application/pdf, 747.8 kB)

Approximate BibTeX Entry

@inproceedings{gordon-filaments,
    Year = {2000},
    Pages = {335-342},
    Publisher = {Morgan Kaufmann},
    Address = {340 Pine Street, 6th Fl., San Francisco, CA 94104},
    Booktitle = {Proceedings of the International Conference on Machine Learning},
    Author = { Geoff Gordon, Andrew Moore },
    Title = {Learning Filaments}
}

Copyright 2010, Carnegie Mellon University, Auton Lab. All Rights Reserved.