The WEB DATABASE will contain the REVERSE INDEX, but will also need to include additional information beyond the basic data in the REVERSE INDEX. To track max-crawl and implement cycle-detection, the WEB DATABASE will also have to store a list of all PAGEs that have been retrieved (the CLOSED LIST). Finally, to implement durable state and restartability, the WEB DATABASE will have to contain a list of the outstanding URLs that have not yet been examined (the OPEN LIST).
Terran Lane 2005-09-21