Paper: Construction of an Idiom Corpus and its Application to Idiom Identification based on WSD Incorporating Idiom-Specific Features

ACL ID D08-1104
Title Construction of an Idiom Corpus and its Application to Idiom Identification based on WSD Incorporating Idiom-Specific Features
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2008
Authors

Some phrases can be interpreted either id- iomatically (figuratively) or literally in con- text, and the precise identification of idioms is indispensable for full-fledged natural lan- guage processing (NLP). To this end, we have constructed an idiom corpus for Japanese. This paper reports on the corpus and the re- sults of an idiom identification experiment us- ing the corpus. The corpus targets 146 am- biguous idioms, and consists of 102,846 sen- tences, each of which is annotated with a lit- eral/idiom label. For idiom identification, we targeted 90 out of the 146 idioms and adopted a word sense disambiguation (WSD) method using both common WSD features and idiom- specific features. The corpus and the experi- ment are the largest of their kind, as far as we know. As a result, we found t...