Paper: Focused training sets to reduce noise in NER feature models

ACL ID N13-1042
Title Focused training sets to reduce noise in NER feature models
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2013
Authors

Feature and context aggregation play a large role in current NER systems, allowing significant opportunities for research into op- timizing these features to cater to different domains. This work strives to reduce the noise introduced into aggregated features from dis- parate and generic training data in order to al- low for contextual features that more closely model the entities in the target data. The pro- posed approach trains models based on only a part of the training set that is more similar to the target domain. To this end, models are trained for an existing NER system using the top documents from the training set that are similar to the target document in order to demonstrate that this technique can be applied to improve any pre-built NER system. Initial results show...