Paper: Catching The Drift: Probabilistic Content Models With Applications To Generation And Summarization

ACL ID N04-1015
Title Catching The Drift: Probabilistic Content Models With Applications To Generation And Summarization
Venue Human Language Technologies
Session Main Conference
Year 2004
Authors

We consider the problem of modeling the con- tent structure of texts within a specific do- main, in terms of the topics the texts address and the order in which these topics appear. We first present an effective knowledge-lean method for learning content models from un- annotated documents, utilizing a novel adap- tation of algorithms for Hidden Markov Mod- els. We then apply our method to two com- plementary tasks: information ordering and ex- tractive summarization. Our experiments show that incorporating content models in these ap- plications yields substantial improvement over previously-proposed methods.