Paper: Using LDA to detect semantically incoherent documents

ACL ID W08-2106
Title Using LDA to detect semantically incoherent documents
Venue International Conference on Computational Natural Language Learning
Session Main Conference
Year 2008

Detecting the semantic coherence of a doc- ument is a challenging task and has sev- eral applications such as in text segmenta- tion and categorization. This paper is an attempt to distinguish between a ‘semanti- cally coherent’ true document and a ‘ran- domly generated’ false document through topic detection in the framework of latent Dirichlet analysis. Based on the premise that a true document contains only a few topics and a false document is made up of many topics, it is asserted that the entropy of the topic distribution will be lower for a true document than that for a false docu- ment. This hypothesis is tested on several false document sets generated by various methods and is found to be useful for fake content detection applications.