UNM Computer Science

Search Technical Reports by ID



The format of the tech reports ID number is TR-CS-YYYY-NN, where YYYY is the four digit year and NN is the number, including leading zeroes. For the first tech report of 2004, the search would be "TR-CS-2004-01".

This searches only by ID. If you'd like, you can also search by researcher or search by keyword

Found 1 result.

Listing from newest to oldest



TR-CS-2001-34

Data Mining using Web Spiders
Carol D. Harrison and George F. Luger

As the volume of available information has grown, the field of data mining has become more important for turning data into usable knowledge. This project's contribution to the field is an analysis of the efficacy of data mining algorithms as applied to categorizing web pages with respect to a search term. Search engines were used as a springboard, following links suggested by the search engine to traverse the Web and reach pages that may not be found by the search engine. It is intended to provide a more selective result based on data mining results through second level verification and application of more stringent requirements for internal page consistency. In this project, AQ11 and ID3 were chosen as classic data mining algorithms representative of the inductive learning tradition; their results were compared to results from the Support Vector Machine, a more recent data mining algorithm, for their ability to extract information from Web pages.

PDF