Paper: Multiword Expression Identification with Tree Substitution Grammars: A Parsing tour de force with French

ACL ID D11-1067
Title Multiword Expression Identification with Tree Substitution Grammars: A Parsing tour de force with French
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2011
Authors

Multiword expressions (MWE), a known nui- sance for both linguistics and NLP, blur the lines between syntax and semantics. Previous workonMWEidentificationhasreliedprimar- ily on surface statistics, which perform poorly for longer MWEs and cannot model discontin- uous expressions. To address these problems, we show that even the simplest parsing mod- els can effectively identify MWEs of arbitrary length, and that Tree Substitution Grammars achievethebestresults. Ourexperimentsshow a 36.4% F1 absolute improvement for French over an n-gram surface statistics baseline, cur- rentlythepredominantmethodforMWEiden- tification. Our models are useful for several NLP tasks in which MWE pre-grouping has improved accuracy.