The second milestone is the web spider engine and the
MoogAlyzerprogram. Required deliverable
components include the following. Note: The WEB DATABASE
that is produced by MSpider MUST NOT be submitted, but the
submission MUST include example report files generated
by MoogAlyzer.
- MSpider.java
- The primary source file for the
Moogle web crawler tool.
- MoogAlyzer.java
- The primary source file for
the MoogAlyzertool.
- Other Java source files
- Any other supporting code files
necessary to compile, load, and use the MSpider.java
and MoogAlyzer.java programs.
Note: if these programs depend on
external library code other than the Java JDK or the
gnu.getopt suite, the submission tarball MUST either include
the library whole or provide easy and explicit instructions on how and
where to access such libraries. This documentation MUST be provided
in the README.TXT file. The designer is responsible for
ensuring that all copyright and distribution conditions are adhered
to.
- README.TXT
- This file MUST describe how to compile,
configure, and install the MSpiderand MoogAlyzertools. It MUST also list
any dependencies on additional software support libraries. Finally,
it MUST list any updates to Milestone 1 deliverables that are being
included in this delivery.
- Internal documentation
- The handin MUST also include the full,
compiled JavaDoc documentation for all Java source files in the
submission tarball. This documentation MUST include full descriptions
of every public or protected method, field, sub-class, enclosed class,
or constructor employed by the code. This documentation hierarchy
MUST be included in a sub-directory named documentation/
within the submission tarball package.
- User documentation
- The handin submission MUST include complete
user-level documentation for the MSpider engine and the
MoogAlyzertool. This
documentation MUST include instructions on how to use both programs,
including the functionality of all command-line options. The
documentation MUST also describe the function and use of any
additional programs included in the submission. User documentation
MUST include information on the expected inputs and outputs of all
programs, how to read and interpret the output, and information on all
status and error messages that the programs could produce. This
documentation MUST also include at least one example of how to run
each program and how to interpret the output. This document MUST be
named USERDOC.extension, but it MAY be be a plain text, HTML,
PDF, or PostScript document (with the appropriate extension).
It MUST NOT be a Microsoft Word or other nonportable format document.
- Performance documentation
- The handin submission MUST include a
document describing the performance of the MSpider engine,
including demonstrations that each PAGE is accessed only once, that
the REVERSE INDEX is built correctly, that the web graph is fully
examined (up to the max-crawl limit), etc. The designer MAY choose
any tests that she or he desires to establish the performance of
her/his MSpider engine, but MUST describe all tests and why
they lead to the stated conclusions about performance. (Hint:
MoogAlyzermay be useful for this purpose. Instrumenting the
MSpidercode with additional statistical counters may also
be helpful.
This document
MUST be named PERFORMANCE.extension, but it MAY be a plain
text, HTML, PDF, or PostScript document (with the appropriate
extension). It MUST NOT be a Microsoft Word or other
nonportable format document.
- Test cases
- The submission tarball MUST include a subdirectory
named tests/ that includes all of the test data used to
demonstrate the performance of the MSpider engine.
At the designer's option, this submission MAY also include:
- BUGS.TXT
- This file documents any known outstanding
bugs, missing features, performance problems, or failures to meet
specifications of your submission. Note that the penalty for such
problems will be smaller if they're fully documented here than if the
instructors discover them independently.
Terran Lane
2005-09-21