Quantitative Requirements
This section describes the performance and IP requirements for the
SpamBGon software suite.
- All programs MUST NOT crash, core dump, dump a stack
trace, or throw an exception on any input.
- In the case of a RECOVERABLE ERROR, a program MUST issue a
warning statement and continue processing. The program MAY
choose to issue the warning statement to standard error or to a
log file. If the warning is issued to a log file, the log file
name and location MUST be a user-specifiable parameter to the
program.
- In the case of an UNRECOVERABLE ERROR, a program MUST issue an
error statement and terminate with a non-zero error condition.
The program MAY use different exit codes to indicate different
error conditions, but such codes MUST be documented in the user
manual. The error message MUST be logged to the same
destination that warning messages (from RECOVERABLE ERRORS)
are.
- In the case of any ERROR, a program MUST NOT
delete, corrupt, or damage existing statistics model files or
any other ``stateful'' files employed by the program suite.
- The MondoHashTable.java module MUST NOT use or reference
the Hashtable, HashMap, AbstractMap,
HashSet, TreeSet, or any of their
subclasses.
- For (substantially) reduced credit, BFSTrain and
BFSTest MAY use the HashMap class in place of
MondoHashTable. Note that this requirement exists only
as an aid in case the programmer has difficulty getting
MondoHashTable to work properly; for full credit the
entire SpamBGon suite MUST employ
MondoHashTable and MUST NOT employ or refer to any of
the classes listed in the previous bullet point.
- The entire program suite MUST NOT employ or refer to the
StreamTokenizer class.
- The programs MAY provide additional output for debugging
purposes, but such output must be disabled by
default. Any program MAY provide a command-line switch to
enable debugging support when desired.
- The SpamBGon suite MAY use the
gnu.getopt.Getopt and gnu.getopt.LongOpt
classes to assist in handling command-line options.
- The programmer MAY ask permission of the instructor or the TA to
use any classes outside the JDK that have not already been
mentioned. The final programs MUST NOT use any class outside
the JDK that have not been explicitly allowed.
- The SpamBGon suite MAY assume that all valid input is
standard ASCII text in the range
(char)0-(char)127, inclusive. If a program
encounters a character outside this range, it MAY treat it it as
a RECOVERABLE or UNRECOVERABLE ERROR or silently ignore it. If
such characters are treated as RECOVERABLE or ignored, they MUST
NOT disrupt the otherwise normal functioning of the program.
- All programs MUST NOT assume that all input is validly
structured email. If a program encounter non-email input (e.g.,
lacking or corrupted HEADERS, invalid character sets, improper
MIME boundaries, etc.) it MAY produce a RECOVERABLE or
UNRECOVERABLE ERROR, but it MUST NOT crash, corrupt the
statistics files, etc. If a program chooses to RECOVER from an
ill-formed email, it MUST NOT corrupt the statistics tables
with information from the illegal input; it MUST wait for the
next valid input before continuing to update statistics tables.
- Both BFSTrain and BFSTest programs MUST run in
amortized
time for email input of size
.
- The MondoHashTable MUST support get(),
put(), remove(), size(), and
isEmpty() in amortized
time. The table MAY
support key/value iteration in time proportional to the
capacity of the table. For extra credit, it MAY support
key/value iteration in time proportional to the number of
keys/values (respectively). To receieve the extra credit, the
designer must demonstrate this convincingly in the performance
documentation.
- The MondoHashTable MUST NOT consume more than
memory for
distinct keys, where
represents the
combined size of a key/value pair.
- The MondoHashTable MUST support the keySet()
and values() operations with only
space above
that required by the hashtable itself. Specifically, these
operations MUST NOT replicate the underlying hashtable, nor
duplicate any keys or values.
- All user documentation MUST be grammatically correct and include
correct spelling and usage. Notably, ``Bayes'' was a real
person so all terminology including his name must be
capitalized. E.g., ``Bayesian spam analysis'', ``naïve
Bayes'', etc.
- The programmer MUST document any areas in which her or his
software suite does not meet this specification.
WARNING! The grade penalty will be higher if the
instructors discover an undocumented program shortcoming or bug
than if it is documented up front.
Terran Lane
2004-01-26