Paper: Exploring The Use Of Linguistic Features In Domain And Genre Classification

ACL ID E99-1019
Title Exploring The Use Of Linguistic Features In Domain And Genre Classification
Venue Annual Meeting of The European Chapter of The Association of Computational Linguistics
Session Main Conference
Year 1999
Authors

The central questions are: How useful is information about part-of-speech fre- quency for text categorisation? Is it fea- sible to limit word features to content words for text classifications? This is examined for 5 domain and 4 genre clas- sification tasks using LIMAS, the Ger- man equivalent of the Brown corpus. Be- cause LIMAS is too heterogeneous, nei- ther question can be answered reliably for any of the tasks. However, the re- sults suggest that both questions have to be examined separately for each task at hand, because in some cases, the ad- ditional information can indeed improve performance.