Paper: A Corpus-Based Statistical Approach To Automatic Book Indexing

ACL ID A92-1020
Title A Corpus-Based Statistical Approach To Automatic Book Indexing
Venue Applied Natural Language Processing Conference
Session Main Conference
Year 1992
Authors

The paper reports on a new approach to automatic generation of back-of-book indexes for Chinese books. Parsing on the level of complete sentential analysis is avoided because of the inefficiency and unavailability of a Chinese Grammar with enough coverage. Instead, fundamental analysis particular to Chinese text called word segmentation is performed to break up characters into a sequence of lexical units equivalent to words in English. The sequence of words then goes through part-of- speech tagging and noun phrase analysis. All these analyses are done using a corpus-based statistical algorithm. Experimental results have shown satisfactory results.