In the "real world", a PE detector would be an embedded software running in a medical imaging unit in a hospital. This scenario is inconvenient (to say the least) to simulate in a data mining competition. Still, in an effort to try to capture some aspects of it (e.g., to prevent people from completely engineering the test data set), each team will only have a limited time to work on the test data.
The results submission process will open July 11, 2006, 0:0:01 MDT and will close again July 21, 2006, 23:59:59 MDT. (MDT is United States Mountain time zone -- "Denver" time. MDT is GMT-0600.)
During the submission phase, a submission web site will be available. The URL will be announced shortly before the submission phase. During the submission phase, the team leader will be able to register on the submission site and receive the test data set.
Once a team has received the test data, they will have 24 hours to submit their results.
A team may re-submit results as many times as they like during their 24 hour period, but only the final submission will be evaluated. The team's results will be returned to them only after their 24 hour submission period (but not necessarily immediately after).
Results for the complete competition will be announced after all submissions are received and the submission period is complete.
The web results submission site will support submission of a separate results set for each task and sub-task. (E.g., teams will be able to submit separate files for Task 1a, 1b, and 1c.) A team may choose to participate in any or all of the tasks, but must submit answers for all sub-tasks within a single task. For example, suppose that team UNM decides to submit only in tasks 1 and 3. Then UNM must submit answers for 1a, 1b, 1c, and 3.
A team may submit the same results file for all sub-tasks if they wish, or they may submit different results for each sub-task.
The submitted data files must contain only one label per line and one line per candidate in the test data set. The labels must be in the same order as the candidates in the test data file. Each label must be the character "0" or "1", where "0" indicates "not a PE" and "1" indicates "PE". It is not necessary to specify which PE each candidate is associated to. For example, if one PE has four associated candidates, the classifier may label any or all of them "1" -- it is not necessary to indicated that all of those candidates are associated with the same underlying PE. A validator for the data file format will be made available shortly.
Each task will be scored separately. For Tasks 1 and 2, the total score will be the average of performance across the three sub-tasks. As stated in the task description, any classifier that exceeds the allowable FP rate on any sub-task will be disqualified for that task.
For Tasks 1 and 2, the score for each subtask will be the sensitivity of the classifier. For Task 3, the score will be the negative prediction value (NPV).
Competitors will be evaluated according to their raw score on the complete test set, as well as on a bootstrap estimate of their mean performance and confidence interval on that mean.
The three tasks will be scored separately and a winner and runner-up will be announced in each task. In addition there may be a prize for "best net submission" across all three tasks.
Each team will also be required to submit a short (at most 2 page) description of their problem formulation and approach. This description is due by July 26. A web-submission process for this document will be made available after the results submission phase is complete. The submitted document must be a PDF, formatted according to KDD submissions standards.