Paper: Validation and Evaluation of Automatically Acquired Multiword Expressions for Grammar Engineering

ACL ID D07-1110
Title Validation and Evaluation of Automatically Acquired Multiword Expressions for Grammar Engineering
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2007
Authors

This paper focuses on the evaluation of meth- ods for the automatic acquisition of Multiword Expressions (MWEs) for robust grammar engi- neering. First we investigate the hypothesis that MWEs can be detected by the distinct statistical properties of their component words, regardless of their type, comparing 3 statistical measures: mutual information (MI), χ2 and permutation entropy (PE). Our overall conclusion is that at least two measures, MI and PE, seem to differen- tiate MWEs from non-MWEs. We then investi- gate the influence of the size and quality of differ- ent corpora, using the BNC and the Web search engines Google and Yahoo. We conclude that, in terms of language usage, web generated corpora are fairly similar to more carefully built corpora, like the BNC, indicating that the la...