This directory contains three separate link data sets: 1. news_entities - generated manually from publically available webpages concerning entities in articles related to terrorism 2. hsiungspam - a link data set created from parsed spam files collected from Paul Hsiung's mailbox. 3. archivespam - created from parsed spam files from www.spamarchive.org Each link data set contains three files: *.names - contains all the names in a link data sets *.links - contains all the links in a link data sets *.alias - contains a subset of the ground truth alias for example, names can contain, ------------------ osama_bin_laden ayman_al_zawahri al_qaeda bin_laden ... ------------------ links can contain, ------------------ link0, al_qaeda, osama_bin_laden, ayman_al_zawahri link1, osama_bin_laden, bin_laden ... ------------------ alias can contain, ------------------ osama_bin_laden bin_laden the_emir bush the_president george_bush usa united_states_of_america ------------------ For further information, please contact Paul Hsiung my email is 1. concat my first and last name with no space in between. lower case p and h 2. append _at_ gmail.com