The contest is open to any party planning to attend KDD 2006. Each of the three tasks will be evaluated separately; you can enter as many tasks (or as few tasks) as you like. A person can participate in only one group per task.
Each participating group must register with the competition in order to gain access to the training data. The registration must indicate a single "group lead" who will be point of contact for the group. Each registered group lead will be subscribed to the KDD Cup 2006 mail list. The mail list will be used for contact with the participating groups and to announce rule clarifications, availability of additional data, etc. Groups are also encouraged to use this list to post questions and hold discussions.
This year's KDD Cup consists of three different tasks. A group may choose to submit to any or all of these tasks. Performance in one task will not positively or negatively impact the evaluation of performance in a different task. If a group chooses to participate in Tasks 1 or 2, they must submit results for all of the sub-tasks. The decision to participate in a given task will be made during results submission; a group does not have to commit to any particular tasks when registering to receive the training data. Test data
The same testing data set will be shared among all three tasks. Groups will simply provide different labelings of that test data, depending on which task they are submitting to.
The testing data is sequestered and will be made available nearer the end of the competition. Details on the submission and evaluation process will be posted and announced soon.
The training data will consist of a single data file plus a file containing field (feature) names. Each line of the file represents a single candidate and comprises a number of whitespace-separated fields:
| Field 0 | Patient ID (unique integer per patient) | |
| Field 1 | Label -0 for negative — this candidate is not a PE;
>0 for positive — this candidate is a PE | |
| Field 2+ | Additional features |
Semantic information on additional features may become available during the contest.
The testing data will be in the same format data file, except that Field 1 (label) will be -1 denoting unknown
| May 15 | Release of KDD Cup Specification v. 1.0; availability of training data. |
| July 10 | Submission mechanism open. [Later revised to July 11] |
| July 17 | End of results submission phase. [Later revised to July 21st] |
| Aug 1 | Results announced |
| Aug 23-26 | KDD |
Last Update: August 2, 2006