Paper: An I-vector Based Approach to Compact Multi-Granularity Topic Spaces Representation of Textual Documents

ACL ID D14-1051
Title An I-vector Based Approach to Compact Multi-Granularity Topic Spaces Representation of Textual Documents
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2014
Authors

Various studies highlighted that topic- based approaches give a powerful spo- ken content representation of documents. Nonetheless, these documents may con- tain more than one main theme, and their automatic transcription inevitably contains errors. In this study, we propose an orig- inal and promising framework based on a compact representation of a textual docu- ment, to solve issues related to topic space granularity. Firstly, various topic spaces are estimated with different numbers of classes from a Latent Dirichlet Allocation. Then, this multiple topic space representa- tion is compacted into an elementary seg- ment, called c-vector, originally developed in the context of speaker recognition. Ex- periments are conducted on the DECODA corpus of conversations. Results show the effectiv...