Milestone 2: MSpider

The second milestone is the web spider engine. Required deliverable components include the following. Note: The WEB DATABASE that is produced by MSpider MUST NOT be submitted, but the submission MUST include statistics summary files.

MSpider.java
The primary source file for the Moogle web crawler tool.
Other Java source files
Any other supporting code files necessary to compile, load, and use the MSpider.java. Note: if this program depend on external library code other than the Java JDK or the gnu.getopt suite, the submission tarball MUST either include the library whole or provide easy and explicit instructions on how and where to access such libraries. This documentation MUST be provided in the README.TXT file. The designer is responsible for ensuring that all copyright and distribution conditions are adhered to.
README.TXT
This file MUST describe how to compile, configure, and install the MSpider engine. It MUST also list any dependencies on additional software support libraries. Finally, it MUST list any updates to Milestone 1 deliverables that are being included in this delivery.
Internal documentation
The handin MUST also include the full, compiled JavaDoc documentation for all Java source files in the submission tarball. This documentation MUST include full descriptions of every public or protected method, field, sub-class, enclosed class, or constructor employed by the code. This documentation hierarchy MUST be included in a sub-directory named documentation/ within the submission tarball package.
User documentation
The handin submission MUST include complete user-level documentation for the MSpider engine. This documentation MUST include instructions on how to use MSpider including the functionality of all command-line options. The documentation MUST also describe the function and use of any additional programs included in the submission. User documentation MUST include information on the expected inputs and outputs of all programs, how to read and interpret the output, and information on all status and error messages that the programs could produce. This documentation MUST also include at least one example of how to run each program and how to interpret the output. This document MUST be named USERDOC.extension, but it MAY be be a plain text, HTML, PDF, or PostScript document (with the appropriate extension). It MUST NOT be a Microsoft Word or other nonportable format document.
Performance documentation
The handin submission MUST include a document describing the performance of the MSpider engine, including demonstrations that each PAGE is accessed only once, that the REVERSE INDEX is built correctly, that the web graph is fully examined (up to the max-crawl limit), etc. The designer MAY choose any tests that she or he desires to establish the performance of her/his MSpider engine, but MUST describe all tests and why they lead to the stated conclusions about performance. This document MUST be named PERFORMANCE.extension, but it MAY be a plain text, HTML, PDF, or PostScript document (with the appropriate extension). It MUST NOT be a Microsoft Word or other nonportable format document.
Test cases
The submission tarball MUST include a subdirectory named tests/ that includes all of the test data used to demonstrate the performance of the MSpider engine.

At the designer's option, this submission MAY also include:

BUGS.TXT
This file documents any known outstanding bugs, missing features, peformance problems, or failures to meet specifications of your submission. Note that the penalty for such problems will be smaller if they're fully documented here than if the instructors discover them independently.

Terran Lane 2005-01-26