autonlab.org

Learning Predictive Models from Small Sets of Dirty Data (2005)

Ashwin Tengli, Artur Dubrawski and Lujie Chen

Abstract

This paper introduces robust predictive rule lists, new structures which combine ideas of decision lists and model trees with robust statistics. A predictive rule list is an ordered if-then-else list of rules, whose consequents are robust predictive models and the antecedents are conditions of their use. We illustrate the utility of the concept using a selection of problems typically approached with multiple linear regression. Empirical results obtained so far reveal features which may be especially appealing in practical applications: identified outliers can be avoided, instead of causing forceful elimination of potentially valid information; small sets of dirty data can be effectively addressed; resulting models are intuitive and easy to interpret. The presented approach tends to be beneficial when relatively high dimensional data comes in short supply, and when it contains substantial amount of non-ignorable measurement errors.

Full text

Download (application/pdf, 216.5 kB)

Approximate BibTeX Entry

@inproceedings{predictive_rule_lists,
    Howpublished = {International Conference on Information and Automation, Colombo, Sri Lanka},
    Month = {December},
    Year = {2005},
    Pages = {6},
    Organization = {IEEE},
    Author = { Ashwin Tengli, Artur Dubrawski and Lujie Chen },
    Title = {Learning Predictive Models from Small Sets of Dirty Data}
}

Copyright 2006, Carnegie Mellon University, Auton Lab