Paper: Multiword Expressions in the wild? The mwetoolkit comes in handy

ACL ID C10-3015
Title Multiword Expressions in the wild? The mwetoolkit comes in handy
Venue International Conference on Computational Linguistics
Session System Demonstration
Year 2010
Authors

The mwetoolkit is a tool for auto- matic extraction of Multiword Expres- sions (MWEs) from monolingual corpora. It both generates and validates MWE can- didates. The generation is based on sur- face forms, while for the validation, a se- ries of criteria for removing noise are pro- vided, such as some (language indepen- dent) association measures.1 In this paper, we present the use of the mwetoolkit in a standard configuration, for extracting MWEs from a corpus of general-purpose English. The functionalities of the toolkit are discussed in terms of a set of selected examples, comparing it with related work on MWE extraction. 1 MWEs in a nutshell One of the factors that makes Natural Language Processing (NLP) a challenging area is the fact that some linguistic phenomena are not entirely com...