The fundamental unit of statistical analysis for the Bayesian spam filter is the statistic for an individual token, which boils down to counts of the number of occurrences of each distinct token seen in the TRAINING data (see Appendix A for details). To track the mapping between tokens and their counts, the SpamBGon suite will use a hash mapping, specifically, a from-scratch implementation of the java.util.Map interface, as documented in the Java 1.4.1 API specification. This module will be named MondoHashTable.java and MUST support the complete java.util.Map interface and contract specification. The MondoHashTable implementation MUST NOT use, access, refer to, or rely on the AbstractMap or any other implementation of the Map interface. The MondoHashTable implementation MAY employ the java.util.AbstractSet implementation to support the Map.keySet() and/or Map.values() operations.
As part of the project deliverables, the developer MUST demonstrate the performance of the MondoHashTable and show that it meets the quantitative requirements given in Section 4. To do so, it will probably be necessary to provide additional data members, methods, or subclasses to track quantities such as number of allocations and reallocations, number of accesses, wall clock time, etc. The choice of which data/methods/subclasses to provide is up to the developer, but all such entities MUST be documented in the API documentation (c.f., Section 5.1).
The MondoHashTable will form the core of the first milestone submission; refer to Section 5.1 for details on the full submission requirements.
Terran Lane 2004-01-26