Paper: Automatic Domain Adaptation for Parsing

ACL ID N10-1004
Title Automatic Domain Adaptation for Parsing
Venue Human Language Technologies
Session Main Conference
Year 2010

Current statistical parsers tend to perform well only on their training domain and nearby gen- res. While strong performance on a few re- lated domains is sufficient for many situations, it is advantageous for parsers to be able to gen- eralize to a wide variety of domains. When parsing document collections involving het- erogeneous domains (e.g. the web), the op- timal parsing model for each document is typ- ically not obvious. We study this problem as a new task — multiple source parser adapta- tion. Our system trains on corpora from many different domains. It learns not only statistics of those domains but quantitative measures of domain differences and how those differences affect parsing accuracy. Given a specific tar- get text, the resulting system proposes linear combinations of p...