Paper: Joint Learning of Chinese Words, Terms and Keywords

ACL ID D14-1186
Title Joint Learning of Chinese Words, Terms and Keywords
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2014

Previous work often used a pipelined framework where Chinese word segmen- tation is followed by term extraction and keyword extraction. Such framework suf- fers from error propagation and is un- able to leverage information in later mod- ules for prior components. In this paper, we propose a four-level Dirichlet Process based model (DP-4) to jointly learn the word distributions from the corpus, do- main and document levels simultaneously. Based on the DP-4 model, a sentence-wise Gibbs sampler is adopted to obtain proper segmentation results. Meanwhile, terms and keywords are acquired in the sampling process. Experimental results have shown the effectiveness of our method.