This section describes the elements that MUST be developed as part of this project. The designer MAY also choose to implement additional Java source files, programs, and/or shell scripts in support of the following items. This section only describes the general performance requirements for each element; for specific deliverable requirements, please refer to Section 6.
The Moogle suite comprises two primary programs: MSpider and Moogle. The MSpider program is responsible for crawling the web to retrieve PAGEs, parse them into individual WORDs, and build the underlying web database. Essentially, MSpider caches the web content and precomputes the REVERSE INDEX so that the user interface client can very quickly locate relevant pages without having to make web requests itself. MSpider is also responsible for accumulating the statistics that will be used by the Moogle UI client program for search and result retrieval. The MSpider program will be discussed in Section 4.2.
The Moogle user interface client program is responsible for providing the user an ergonomic interface to search the cached REVERSE INDEX and then present the results to the user, sorted in relevence order. Important components of the Moogle client are the user interface itself, a parser for understanding user queries, and the search/sort mechanism that retrieves and presents the query response. The Moogle client program will be discussed in Section 4.3.
A core component employed by both programs is the REVERSE INDEX. The
REVERSE INDEX, in turn, can be built around a hash map (i.e., hash
table) to provide high-performance access and retrieval of
WORD
PAGE relationships. The hash map implementation will
be the first deliverable for this project. The hash map will be
discussed in Section 4.1.