UNM Computer Science

KDD Cup 2006 Rules


Eligibility

The contest is open to any party planning to attend KDD 2006. Each of the three tasks will be evaluated separately; you can enter as many tasks (or as few tasks) as you like. A person can participate in only one group per task.

Registration

Each participating group must register with the competition in order to gain access to the training data. The registration must indicate a single "group lead" who will be point of contact for the group. Each registered group lead will be subscribed to the KDD Cup 2006 mail list. The mail list will be used for contact with the participating groups and to announce rule clarifications, availability of additional data, etc. Groups are also encouraged to use this list to post questions and hold discussions.

Participation in tasks

This year's KDD Cup consists of three different tasks. A group may choose to submit to any or all of these tasks. Performance in one task will not positively or negatively impact the evaluation of performance in a different task. If a group chooses to participate in Tasks 1 or 2, they must submit results for all of the sub-tasks. The decision to participate in a given task will be made during results submission; a group does not have to commit to any particular tasks when registering to receive the training data. Test data

The same testing data set will be shared among all three tasks. Groups will simply provide different labelings of that test data, depending on which task they are submitting to.

The testing data is sequestered and will be made available nearer the end of the competition. Details on the submission and evaluation process will be posted and announced soon.

Data format

The training data will consist of a single data file plus a file containing field (feature) names. Each line of the file represents a single candidate and comprises a number of whitespace-separated fields:

Field 0Patient ID (unique integer per patient)
Field 1Label -0 for negative — this candidate is not a PE;
>0 for positive — this candidate is a PE
Field 2+Additional features

Semantic information on additional features may become available during the contest.

The testing data will be in the same format data file, except that Field 1 (label) will be -1 denoting unknown

Evaluation

Each submission will be evaluated according to the criteria set forth under each task (see the full PDF for details). The winner for each task will be the group with the best score according to the specified metric for that task. In the event of a tie, multiple winners may be awarded or, at the chair's option, a tie-breaking metric may be employed. Results of the competition will be announced individually to participants in advance of KDD; public announcement of results will be during the opening ceremony of KDD.

Timeline

May 15 Release of KDD Cup Specification v. 1.0; availability of training data.
July 10 Submission mechanism open. [Later revised to July 11]
July 17 End of results submission phase. [Later revised to July 21st]
Aug 1 Results announced
Aug 23-26 KDD

Last Update: August 2, 2006