Paper: Extracting Clusters of Specialist Terms from Unstructured Text

ACL ID D14-1149
Title Extracting Clusters of Specialist Terms from Unstructured Text
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2014
Authors

Automatically identifying related special- ist terms is a difficult and important task required to understand the lexical struc- ture of language. This paper develops a corpus-based method of extracting co- herent clusters of satellite terminology ? terms on the edge of the lexicon ? us- ing co-occurrence networks of unstruc- tured text. Term clusters are identi- fied by extracting communities in the co- occurrence graph, after which the largest is discarded and the remaining words are ranked by centrality within a community. The method is tractable on large corpora, requires no document structure and min- imal normalization. The results suggest that the model is able to extract coher- ent groups of satellite terms in corpora with varying size, content and structure. The findings also conf...