Paper: Parser-Based Retraining for Domain Adaptation of Probabilistic Generators

ACL ID W08-1122
Title Parser-Based Retraining for Domain Adaptation of Probabilistic Generators
Venue International Conference on Natural Language Generation
Session Main Conference
Year 2008
Authors

While the effect of domain variation on Penn- treebank-trainedprobabilisticparsershasbeen investigated in previous work, we study its ef- fect on a Penn-Treebank-trained probabilistic generator. We show that applying the gener- ator to data from the British National Corpus results in a performance drop (from a BLEU score of 0.66 on the standard WSJ test set to a BLEU score of 0.54 on our BNC test set). We develop a generator retraining method where the domain-specific training data is automat- ically produced using state-of-the-art parser output. The retraining method recovers a sub- stantial portion of the performance drop, re- sulting in a generator which achieves a BLEU score of 0.61 on our BNC test data.