Paper: Unsupervised Training Set Generation for Automatic Acquisition of Technical Terminology in Patents

ACL ID C14-1029
Title Unsupervised Training Set Generation for Automatic Acquisition of Technical Terminology in Patents
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2014
Authors

NLP methods for automatic information access to rich technological knowledge sources like patents are of great value. One important resource for accessing this knowledge is the tech- nical terminology of the patent domain. In this paper, we address the problem of automatic terminology acquisition (ATA), i.e., the problem of automatically identifying all technical terms in a document. We analyze technical terminology in patents and define the concept of technical term based on the analysis. We present a novel method for labeling large amounts of high-quality training data for ATA in an unsupervised fashion. We train two ATA methods on this training data, a term candidate classifier and a conditional random field (CRF), and investigate the utility of different types of features. Finally, we ...