Paper: Global Models of Document Structure using Latent Permutations

ACL ID N09-1042
Title Global Models of Document Structure using Latent Permutations
Venue Human Language Technologies
Session Main Conference
Year 2009
Authors

We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organiza- tion of document topics. We propose a global model in which both topic selection and order- ing are biased to be similar across a collection of related documents. We show that this space of orderings can be elegantly represented us- ing a distribution over permutations called the generalized Mallows model. Our structure- aware approach substantially outperforms al- ternative approaches for cross-document com- parison and single-document segmentation.1