Paper: Automatic Extraction of Fixed Multiword Expressions

ACL ID I05-1050
Title Automatic Extraction of Fixed Multiword Expressions
Venue International Joint Conference on Natural Language Processing
Session Main Conference
Year 2005

Fixed multiword expressions are strings of words which to- gether behave like a single word. This research establishes a method for the automatic extraction of such expressions. Our method involves three stages. In the first, a statistical measure is used to extract candidate bi- grams. In the second, we use this list to select occurrences of candidate expressions in a corpus, together with their surrounding contexts. These examples are used as training data for supervised machine learning, re- sulting in a classifier which can identify target multiword expressions. The final stage is the estimation of the part of speech of each extracted expression based on its context of occurence. Evaluation demonstrated that collocation measures alone are not effective in identifying target ex- press...