Paper: Supervised And Unsupervised PCFG Adaptation To Novel Domains

ACL ID N03-1027
Title Supervised And Unsupervised PCFG Adaptation To Novel Domains
Venue Human Language Technologies
Session Main Conference
Year 2003
Authors

This paper investigates adapting a lexicalized probabilistic context-free grammar (PCFG) to a novel domain, using maximum a posteriori (MAP) estimation. The MAP framework is gen- eral enough to include some previous model adaptation approaches, such as corpus mixing in Gildea (2001), for example. Other approaches falling within this framework are more effec- tive. In contrast to the results in Gildea (2001), we show F-measure parsing accuracy gains of as much as 2.5% for high accuracy lexicalized pars- ing through the use of out-of-domain treebanks, with the largest gains when the amount of in- domain data is small. MAP adaptation can also be based on either supervised or unsupervised adap- tation data. Even when no in-domain treebank is available, unsupervised techniques provide a sub- st...