Requirements

This section describes the elements that MUST be developed as part of this project. The designer MAY also choose to implement additional Java source files, programs, and/or shell scripts in support of the following items. This section only describes the general performance requirements for each element; for specific deliverable requirements, please refer to Section 7.

The Moogle suite comprises two primary programs: MSpider and Moogle, as well as a stand-alone analysis program MoogAlyzer, that will be used for testing and validation. The MSpider program is responsible for crawling the web to retrieve PAGEs, parsing them into individual WORDs, and building the underlying WEB DATABASE. Essentially, MSpider caches the web content and precomputes the REVERSE INDEX so that the user interface client can very quickly locate relevant pages without having to make web requests itself. MSpider is also responsible for accumulating the statistics that will be used by the Moogle UI client program for search and result retrieval. The MSpider program will be discussed in Section 5.2.

The Moogle user interface client program is responsible for providing the user an ergonomic interface to search the cached WEB DATABASE (via the REVERSE INDEX) and then present the results to the user, sorted in relevance order. Important components of the Moogle client are the user interface itself, a parser for understanding user queries, and the search/sort mechanism that retrieves and presents the query response. The Moogle client program will be discussed in Section 5.3.

A core component employed by both programs is the WEB DATABASE and, within that database, the REVERSE INDEX. The REVERSE INDEX, in turn, can be built around a hash map (i.e., hash table) to provide high-performance access and retrieval of WORD $ \rightarrow$PAGE relationships. The hash map implementation will be the first deliverable for this project. The hash map will be discussed in Section 5.1.

Finally, the MoogAlyzerprogram provides a mechanism for repeatable testing and validation of the WEB DATABASE. The job of this program is to load a WEB DATABASE produced by MSpider and write out a series of summaries of the database. This will allow both you and us to ensure that MSpider is working correctly and that the correct information is being stored in the WEB DATABASE. In practice, this would be an internal tool used by MondoSoft and not shipped to the customer; for this project, though, you will turn it in with the rest of the Moogle suite. MoogAlyzerwill be discussed in detail in Section 5.4.

The designer MAY choose any package naming convention (including the default package) for this project. If a root package other than the default is chosen, it SHOULD be edu.unm.cs.[yourname].cs351.p1. If the choice of package affects how the programs are invoked, the README.TXT document MUST specify this.



Subsections
Terran Lane 2005-08-23