Paper: Deriving an Ambiguous Word's Part-of-Speech Distribution from Unannotated Text

ACL ID P07-2014
Title Deriving an Ambiguous Word's Part-of-Speech Distribution from Unannotated Text
Venue Annual Meeting of the Association of Computational Linguistics
Session System Demonstration
Year 2007
Authors

A distributional method for part-of-speech induction is presented which, in contrast to most previous work, determines the part-of-speech distribution of syntacti- cally ambiguous words without explicitly tagging the underlying text corpus. This is achieved by assuming that the word pair consisting of the left and right neighbor of a particular token is characteristic of the part of speech at this position, and by clustering the neighbor pairs on the basis of their middle words as observed in a large corpus. The results obtained in this way are evaluated by comparing them to the part-of-speech distributions as found in the manually tagged Brown corpus.