Paper: Effective Document-Level Features for Chinese Patent Word Segmentation

ACL ID P14-2033
Title Effective Document-Level Features for Chinese Patent Word Segmentation
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

A patent is a property right for an inven- tion granted by the government to the in- ventor. Patents often have a high con- centration of scientific and technical terms that are rare in everyday language. How- ever, some scientific and technical terms usually appear with high frequency only in one specific patent. In this paper, we propose a pragmatic approach to Chinese word segmentation on patents where we train a sequence labeling model based on a group of novel document-level features. Experiments show that the accuracy of our model reached 96.3% (F 1 score) on the de- velopment set and 95.0% on a held-out test set.