Paper: Chunking Clinical Text Containing Non-Canonical Language

ACL ID W14-3411
Title Chunking Clinical Text Containing Non-Canonical Language
Venue Proceedings of the BioNLP Shared Task 2013 Workshop
Session
Year 2014
Authors

Free text notes typed by primary care physicians during patient consultations typically contain highly non-canonical language. Shallow syntactic analysis of free text notes can help to reveal valu- able information for the study of disease and treatment. We present an exploratory study into chunking such text using off- the-shelf language processing tools and pre-trained statistical models. We evalu- ate chunking accuracy with respect to part- of-speech tagging quality, choice of chunk representation, and breadth of context fea- tures. Our results indicate that narrow con- text feature windows give the best results, but that chunk representation and minor differences in tagging quality do not have a significant impact on chunking accuracy.