Paper: Automatic Idiom Identification in Wiktionary

ACL ID D13-1145
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2013

Online resources, such as Wiktionary, provide an accurate but incomplete source of idiomatic phrases. In this paper, we study the problem of automatically identifying idiomatic dictio- nary entries with such resources. We train an idiom classifier on a newly gathered cor- pus of over 60,000 Wiktionary multi-word definitions, incorporating features that model whether phrase meanings are constructed compositionally. Experiments demonstrate that the learned classifier can provide high quality idiom labels, more than doubling the number of idiomatic entries from 7,764 to 18,155 at precision levels of over 65%. These gains also translate to idiom detection in sen- tences, by simply using known word sense disambiguation algorithms to match phrases to their definitions. In a set of Wiktionary def...