SBNS Datasets
Dataset Descriptions
Institute Data
A set of records of collaborations between professors and students collected from publicly available web pages listed on Carnegie Mellon University Robotics Institute’s web site.
NIPS Data
A set containing co-authorship information of the Neural Information Processing conference (NIPS) contained in proceedings 1-12, the pre-electronic submission era
Medline Data
A sample of the co-authorship information of the publicly available medical publication database Medline.
IOBDB Data
Contains information about 55 years of Off-Broadway shows. Each link is a play and entities are actors, directors, writers involved in that play.
IMDB Data
A collection of casts of actors that participated in movies between the years of 1900 and 1960 (the small subset) and from2000- on (the big subset) extracted from the Internet Movie Database
Citeseer Data
A set of co-publication records from the Citeseer online library and index of computer science publications. Since the entities are represented by first initial and last name, a single name might correspond to several people.
Dataset sizes and download links
|
Datasets |
People |
Records |
Avg people/record |
Avg records/person |
Link |
Size |
|
Institute |
3,342 |
5,152 |
2.77 |
4.24 |
63 k |
|
|
NIPS |
2,037 |
1,740 |
2.29 |
1.96 |
24 k |
|
|
Medline-s |
19,499 |
6,217 |
3.6 |
1.15 |
169 k |
|
|
Medline-m |
88,244 |
186,150 |
4.4 |
2.1 |
3.1 M |
|
|
Medline-b |
3,228,008 |
8,008,134 |
3.86 |
9.57 |
213 M |
|
|
IOBDB |
29,446 |
3,686 |
30.4 |
2.55 |
626 k |
|
|
IMDB-s |
198,571 |
58,642 |
7.73 |
2.3 |
4.4 M |
|
|
IMDB-b |
1,232,030 |
419,661 |
13.34 |
4.54 |
52 M |
|
|
Citeseer-s |
104,515 |
180,395 |
2.83 |
4.88 |
2.8 M |
|
|
Citeseer-b |
304,490 |
385,923 |
3.1 |
3.9 |
11 M |
Table 1: Datasets and their sizes. ’-s’ means small subset, ’-m’ - medium size and ’-b’ means large subset of the same data