UNM Computer Science

Search Technical Reports by ID



The format of the tech reports ID number is TR-CS-YYYY-NN, where YYYY is the four digit year and NN is the number, including leading zeroes. For the first tech report of 2004, the search would be "TR-CS-2004-01".

This searches only by ID. If you'd like, you can also search by researcher or search by keyword

Found 1 result.

Listing from newest to oldest



TR-CS-2001-23

The Distribution of Variable-length Phatic Interjectives on the World Wide Web
Dennis Chao and Patrik D'haeseleer

If one uses a commercial internet search engine to search for increasingly long versions of variable-length interjectives on the web (e.g. "whee", "wheee", "wheeee", etc), the number of pages found containing these longer words falls off as a power law. The exponents for the length frequency distributions of different interjectives are not the same, although they may cluster around a few exponents. Surprisingly, the exponents are much larger than the -1 predicted by Zipf's Law. We believe that the restricted domain of variable-length phatic interjectives is an interesting subset of English that can provide an alternative simple model system of word length distributions.

gzipped postscript