Paper: SimSem: Fast Approximate String Matching in Relation to Semantic Category Disambiguation

ACL ID W11-0218
Title SimSem: Fast Approximate String Matching in Relation to Semantic Category Disambiguation
Venue Workshop on Biomedical Natural Language Processing
Session
Year 2011
Authors

In this study we investigate the merits of fast approximate string matching to address challenges relating to spelling variants and to utilise large-scale lexical resources for seman- tic class disambiguation. We integrate string matching results into machine learning-based disambiguation through the use of a novel set of features that represent the distance of a given textual span to the closest match in each of a collection of lexical resources. We col- lect lexical resources for a multitude of se- mantic categories from a variety of biomedi- cal domain sources. The combined resources, containing more than twenty million lexical items, are queried using a recently proposed fast and efficient approximate string match- ing algorithm that allows us to query large resources without severely ...