Paper: Role Of Local Context In Automatic Deidentification Of Ungrammatical Fragmented Text

ACL ID N06-1009
Title Role Of Local Context In Automatic Deidentification Of Ungrammatical Fragmented Text
Venue Human Language Technologies
Session Main Conference
Year 2006
Authors

Deidentification of clinical records is a crucial step before these records can be distributed to non-hospital researchers. Most approaches to deidentification rely heavily on dictionaries and heuristic rules; these approaches fail to remove most per- sonal health information (PHI) that cannot be found in dictionaries. They also can fail to remove PHI that is ambiguous between PHI and non-PHI. Named entity recognition (NER) tech- nologies can be used for deidentification. Some of these technologies exploit both local and global context of a word to iden- tify its entity type. When documents are grammatically written, global context can improve NER. In this paper, we show that we can dei- dentify medical discharge summaries us- ing support vector machines that rely on a statistical represen...