ACL Anthology Network (All About NLP) (beta) The Association Of Computational Linguistics Anthology Network |
ACL ID | D08-1111 |
---|---|
Title | Information Retrieval Oriented Word Segmentation based on Character Association Strength Ranking |
Venue | Conference on Empirical Methods in Natural Language Processing |
Session | Main Conference |
Year | 2008 |
Authors |
|
This paper presents a novel, ranking-style word segmentation approach, called RSVM- Seg, which is well tailored to Chinese informa- tion retrieval(CIR). This strategy makes seg- mentation decision based on the ranking of the internal associative strength between each pair of adjacent characters of the sentence. On the training corpus composed of query items, a ranking model is learned by a widely-used tool Ranking SVM, with some useful statistical features, such as mutual information, differ- ence of t-test, frequency and dictionary infor- mation. Experimental results show that, this method is able to eliminate overlapping am- biguity much more effectively, compared to the current word segmentation methods. Fur- thermore, as this strategy naturally generates segmentation results with diffe...