Rollout: The SpamBGon Suite

The second and final stage of the project is the rollout of the completed project. The deliverables for this stage are:

BSFTrain.java and BSFTest.java
The two primary programs of the SpamBGon suite.
Other Java source files
Any other supporting code files necessary to compile, load, and use the BSFTrain and BSFTest programs. Note: if these programs depend on external library code other than the Java JDK or the gnu.getopt suite, the submission tarball MUST either include the library whole or provide easy and explicit instructions on how and where to access such libraries. This documentation MUST be provided in the README.TXT file. The designer is responsible for ensuring that all copyright and distribution conditions are adhered to.
README.TXT
This file MUST describe how to compile, configure, and install the SpamBGon suite. It MUST also list any dependencies on additional software support libraries.
Internal documentation
The handin MUST also include the full, compiled JavaDoc documentation for all Java source files in the submission tarball. This documentation MUST include full descriptions of every public or protected method, field, sub-class, enclosed class, or constructor employed by the code. This documentation hierarchy MUST be included in a sub-directory named documentation/ within the submission tarball package.
User documentation
The handin submission MUST include complete user-level documentation for the SpamBGon suite. This documentation MUST include instructions on how to use both BSFTrain and BSFTest including the functionality of all command-line options. The documentation MUST also describe the function and use of any additional programs included in the submission. User documentation MUST include information on the expected inputs and outputs of all programs, how to read and interpret the output, and information on all status and error messages that the programs could produce. This documentation MUST also include at least one example of how to run each program and how to interpret the output. This document MUST be named USERDOC.extension, but it MAY be be a plain text, HTML, PDF, or PostScript document (with the appropriate extension). It MUST NOT be a Microsoft Word or other nonportable format document.
Performance documentation
The handin submission MUST include a document describing the performance of the SpamBGon suite, including its ability to differentiate SPAM from NORMAL email under different amounts of TRAINING data and under different tokenizers (including a small range of reasonable parameters for each parameterized tokenizer). This document MUST also include the designer's assessment of which tokenizer is superior and why or, if different tokenizers are superior under different conditions, what conditions are important to the success of each. The designer MAY choose any tests that she or he desires to establish the performance of her/his SpamBGon suite, but MUST describe all tests and why they lead to the stated conclusions about performance. Finally, this document MUST include the designer's assessment of how to improve the performance of the system (e.g., what other kind of tokenizer might be helpful, how to change the probability equations to improve accuracy, etc.) This document MUST be named PERFORMANCE.extension, but it MAY be a plain text, HTML, PDF, or PostScript document (with the appropriate extension). It MUST NOT be a Microsoft Word or other nonportable format document.
Test cases
The submission tarball MUST include a subdirectory named tests/ that includes all of the test data used to demonstrate the performance of the SpamBGon suite.
CVS log file(s)
For each Java source file, the submission tarball MUST include a corresponding .log file including the CVS log for that sourcecode.

At the programmer's option, this submission MAY also include:

BUGS.TXT
This file documents any known outstanding bugs, missing features, peformance problems, or failures to meet specifications of your submission. Note that the penalty for such problems will be smaller if they're fully documented here than if the instructors discover them independently.

Note that if the MondoHashTable code is not fully functional for Milestone 1, a revised version MAY be submitted in this handin. If MondoHashTable has been revised for this version, this submission tarball MUST include the necessary supporting documentation described under Milestone 1, as well as notes describing the added functionality/improvements between Milestone 1 and this handin.

Terran Lane 2004-01-26