![]() |
|||||
|
KDD Cup 2003: Datasets Sept 4, 2003: The datasets available for public download have been finalized. I. Citation Prediction Task Available for contestants:
II. Data Cleaning Task For this task the LaTeX sources of the hep-ph papers on March 1, 2003 are available for download. A random paper id between 1 and 100,000 has been assigned to each paper. Also, a small subset of papers were converted from pdf/ps and only appear as plain text. There are over 35,000 hep-ph papers with 1.8 gigs of data, so the download has been broken into 10 separate tar gzips of 50MB each, plus 1 extra tarball with the plain text papers. hep-ph part 0 hep-ph part 1 hep-ph part 2 hep-ph part 3 hep-ph part 4 hep-ph part 5 hep-ph part 6 hep-ph part 7 hep-ph part 8 hep-ph part 9 hep-ph part 10 (plain text papers) Sept 4, 2003: The corresponding citation graph for hep-ph used as the evaluation criteria is now available here. III. Download Estimation Task Available for this task are the same datasets for task 1 plus:
IV. Open Task Contestants can use any of the hep-th data from Tasks 1 or 3. |
||||
![]() |
|||||