autonlab.org

Bayesian Networks for Lossless Dataset Compression (1999)

Scott Davies, Andrew Moore

Tags

Bayesian Networks

Abstract

The recent explosion in research on probabilistic data mining algorithms such as Bayesian networks has been focussed primarily on their use in diagnostics, prediction and efficient inference. In this paper, we examine the use of Bayesian networks for a different purpose: lossless compression of large datasets. We present algorithms for automatically learning Bayesian networks and new structures called "Huffman networks" that model statistical relationships in the datasets, and algorithms for using these models to then compress the datasets. These algorithms often achieve significantly better compression ratios than achieved with common dictionary-based algorithms such those used by programs like ZIP.

Full text

Download (application/pdf, 166.0 kB)

Approximate BibTeX Entry

@inproceedings{davies-bayesian,
    Year = {1999},
    Pages = {387-391},
    Publisher = {AAAI Press},
    Booktitle = {Proceedings of the Fifth International Conference on Knowledge Discovery in Databases},
    Author = { Scott Davies, Andrew Moore },
    Title = {Bayesian Networks for Lossless Dataset Compression}
}

Copyright 2010, Carnegie Mellon University, Auton Lab. All Rights Reserved.