Paper: Untangling Text Data Mining

ACL ID P99-1001
Title Untangling Text Data Mining
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1999

The possibilities for data mining from large text collections are virtually untapped. Text ex- presses a vast, rich range of information, but en- codes this information in a form that is difficult to decipher automatically. Perhaps for this rea- son, there has been little work in text data min- ing to date, and most people who have talked about it have either conflated it with informa- tion access or have not made use of text directly to discover heretofore unknown information. In this paper I will first define data mining, information access, and corpus-based computa- tional linguistics, and then discuss the relation- ship of these to text data mining. The intent behind these contrasts is to draw attention to exciting new kinds of problems for computa- tional linguists. I describe example...