Alias Detection Datasets
The following datasets were used in (Hsiung et al, 2005) Alias Detection in Link Data Sets by Paul Hsiung, Andrew Moore, Daniel Neill and Jeff Schneider, Proceedings of the International Conference on Intelligence Analysis, 2005.
The datasets can be used as example inputs to the Many Names One Person software by Paul Hsiung.
They are stored in this form on this page in order to allow other researchers to run experiments on the same datasets with identical preprocessing, including discretization levels of real-valued attributes and compensation for missing values.
- Readme file
-
Spam Archive Data
- archivespam.alias: Training aliases
- archivespam.names: Names file
- archivespam.links: Links file
- Paul Hsiung's Spam Data
- hsiungspam.alias: Training aliases
- hsiungspam.names: Names file
- hsiungspam.links: Links file
- News Article Entities Data
- news_entities.alias: Training aliases
- news_entities.names: Names file
- news_entities.links: Links file
Please feel welcome to contact Paul Hsiung or Andrew Moore with questions or comments.