Paper: Mining Scientific Terms and their Definitions: A Study of the ACL Anthology

ACL ID D13-1073
Title Mining Scientific Terms and their Definitions: A Study of the ACL Anthology
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2013
Authors

This paper presents DefMiner, a supervised sequence labeling system that identifies scien- tific terms and their accompanying definitions. DefMiner achieves 85% F1 on a Wikipedia benchmark corpus, significantly improving the previous state-of-the-art by 8%. We exploit DefMiner to process the ACL An- thology Reference Corpus (ARC) ? a large, real-world digital library of scientific arti- cles in computational linguistics. The re- sulting automatically-acquired glossary rep- resents the terminology defined over several thousand individual research articles. We highlight several interesting observations: more definitions are introduced for conference and workshop papers over the years and that multiword terms account for slightly less than half of all terms. Obtaining a list of popular define...