Paper: Unsupervised Instance-Based Part of Speech Induction Using Probable Substitutes

ACL ID C14-1217
Title Unsupervised Instance-Based Part of Speech Induction Using Probable Substitutes
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2014
Authors

We develop an instance (token) based extension of the state of the art word (type) based part-of- speech induction system introduced in (Yatbaz et al., 2012). Each word instance is represented by a feature vector that combines information from the target word and probable substitutes sampled from an n-gram model representing its context. Modeling ambiguity using an instance based model does not lead to significant gains in overall accuracy in part-of-speech tagging be- cause most words in running text are used in their most frequent class (e.g. 93.69% in the Penn Treebank). However it is important to model ambiguity because most frequent words are ambiguous and not modeling them correctly may negatively affect upstream tasks. Our main contribution is to show that an instance based model ca...