Query Parser

The query language supported by the Moogle UI is a simple ``AND/OR'' query language. Moogle MUST recognize the following syntax:

        QUERY := WORD* |
                 QUERY "AND" QUERY |
                 QUERY "OR" QUERY |
                 "(" QUERY ")"
        WORD := "[a-zA-Z0-9]+"
Note that there are only five kinds of tokens in this language: WORDs, AND, OR, (, and ). All tokens are WHITESPACE separated, but WHITESPACE is otherwise discarded. PUNCTUATION is discarded. PUNCTUATION falling in the middle of a word (e.g., hyper-cool or TF/IDF) MAY be treated as a token separator (hyper-cool becomes hyper and cool) or MAY be dropped and the token parts conjoined (TF/IDF becomes TFIDF).

The semantics of this language are reasonably natural:

  1. A query consisting of a single WORD should return the set of documents in the REVERSE INDEX matching that word.
  2. A query consisting of a sequence of WORDs, with no conjunctions specified (e.g., word1 word2 word3 word4) should be treated as if the AND conjunctive were specified (word1 AND word2 AND word3 AND word4).
  3. The AND of two queries should return the conjunction of those queries - the set of documents that match both sub-queries.
  4. The OR of two queries should return the disjunction of thos queries - the set of documents that match either sub-query.
  5. Precedence is left-to-right, unless delimited by parentheses. E.g.,
            w1 AND w2 OR w3 AND w4
    
    should parse as
            (((w1 AND w2) OR w3) AND w4)
    
    while
            (w1 AND w2) OR (w3 AND w4)
    
    should parse as written.
  6. A query that returns no documents, including the empty query, is syntactically valid and should display no URLs and/or print a message indicating that no matching documents were found.
  7. Queries SHOULD be treated as case-insensitive but they MAY be treated as case-sensitive. The exceptions are the conjunctives (AND and OR), which SHOULD be treated as case-sensitive. Either way, case (in)sensitivity MUST be documented in the user manual.

The designer MAY choose to offer additional query language syntax and functionality. For example, support for a NOT modifier or quoted phrases would be useful extensions.

Terran Lane 2005-01-26