Paper: Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources

ACL ID D11-1077
Title Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2011
Authors

We propose an architecture for expressing various linguistically-motivated features that help identify multi-word expressions in nat- ural language texts. The architecture com- bines various linguistically-motivated clas- sification features in a Bayesian Network. We introduce novel ways for computing many of these features, and manually de- fine linguistically-motivated interrelationships among them, which the Bayesian network models. Our methodology is almost en- tirely unsupervised and completely language- independent; it relies on few language re- sources and is thus suitable for a large num- ber of languages. Furthermore, unlike much recent work, our approach can identify ex- pressions of various types and syntactic con- structions. We demonstrate a significant im- provement in identi...