UNM Computer Science

Data Mining



Sponsors

Paul Helman

Bob Veroff

The analysis of information is an area of Computer Science rapidly growing in importance. Data mining is an umbrella term that applies to a multitude of techniques for extracting from massive quantities of information various types of important, interesting, or unexpected phenomena. Because the information of interest is of a wide variety of natures, and because the type of phenomena which we seek varies and often is ill defined, many diverse technologies must be developed and applied in novel ways.

Our motivating applications are characterized by problems in which we are presented with a huge quantity of information, of which only a small fraction is of particular interest. Since it often is costly to determine whether an individual item truly is interesting (e.g., by investigating the source of that information), resource limitations require that we prioritize the information so that the few items of interest are the ones we choose to pursue. By providing a prioritization, rather than simply a classification, the boundary separating the items eliciting decisions to pursue or not is dynamic; one can pursue the highest ranked items until time, money, or interest is exhausted, and the remaining items implicitly are discarded.

We are engaged in an ongoing effort to develop prioritization systems, with our research emphasizing algorithmic and statistical components of the solution. The value of such prioritization systems is ever increasing in the context of today's information explosion, where one often is flooded with an overwhelming quantity of information and must make snap decisions under uncertainty. Important application areas include computer security, telecommunications, document retrieval, radio astronomy, and biomedical testing.