autonlab.org

Optimal Reinsertion Datasets

The following datasets were used in Moore and Wong (2003), Optimal Reinsertion: A new search operator for accelerated and more accurate Bayesian network structure learning, ICML 2003.

They are stored in this form on this page in order to allow other researchers to run experiments on the same datasets with identical preprocessing, including discretization levels of real-valued attributes and compensation for missing values.

If you use these datasets please acknowledge their original sources.

  • adult.csv (49K records, 15 attributes, average arity 7.7), Contributed to UCI by Ron Kohavi
  • alarm.csv (20K records, 37 attributes, average arity 2.8), Data generated from a standard Bayes Net benchmark (Beinlich et al., 1989)
  • The medical informatics dataset cannot be released. Apologies.
  • covtype.csv (150K records, 39 attributes, average arity 2.8), Contributed to UCI by Jock Blackard.
  • connect4.csv (67K records, 43 attributes, average arity 3.0), Contributed to UCI by John Tromp
  • edsgc.csv (300K records, 24 attributes, average arity 2.0), Data on 300,000 galaxies from the Edinburgh-Durham Sky Survey (Nichol et. al, 2000)
  • synth2.csv (25K records, 36 attributes, average arity 2.0), Synthetic dataset described in the above paper.
  • synth3.csv (25K records, 36 attributes, average arity 2.0), Synthetic dataset described in the above paper.
  • synth4.csv (25K records, 36 attributes, average arity 2.0), Synthetic dataset described in the above paper.
  • nursery.csv (13K records, 9 attributes, average arity 3.6), Contributed to UCI by Marko Bohanec and Blaz Zupan
  • letters.csv (20K records, 17 attributes, average arity 3.4), Contributed to UCI by David Slate

The UCI Machine Learning Repository should be cited as:

Blake, C.L. & Merz, C.J. (1998). UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.

Please feel welcome to contact Andrew Moore with questions or comments.

Copyright 2010, Carnegie Mellon University, Auton Lab. All Rights Reserved.