Paper: Revealing the Structure of Medical Dictations with Conditional Random Fields

ACL ID D08-1001
Title Revealing the Structure of Medical Dictations with Conditional Random Fields
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2008
Authors

Automatic processing of medical dictations poses a significant challenge. We approach the problem by introducing a statistical frame- work capable of identifying types and bound- aries of sections, lists and other structures occurring in a dictation, thereby gaining ex- plicit knowledge about the function of such elements. Training data is created semi- automatically by aligning a parallel corpus of corrected medical reports and correspond- ing transcripts generated via automatic speech recognition. We highlight the properties of our statistical framework, which is based on conditional random fields (CRFs) and im- plemented as an efficient, publicly available toolkit. Finally, we show that our approach is effective both under ideal conditions and for real-life dictation involving speech re...