Paper: Topic Modeling Based Classification of Clinical Reports

ACL ID P13-3010
Title Topic Modeling Based Classification of Clinical Reports
Venue Annual Meeting of the Association of Computational Linguistics
Session Student Session
Year 2013

Electronic health records (EHRs) contain important clinical information about pa- tients. Some of these data are in the form of free text and require preprocessing to be able to used in automated systems. Effi- cient and effective use of this data could be vital to the speed and quality of health care. As a case study, we analyzed clas- sification of CT imaging reports into bi- nary categories. In addition to regular text classification, we utilized topic mod- eling of the entire dataset in various ways. Topic modeling of the corpora provides in- terpretable themes that exist in these re- ports. Representing reports according to their topic distributions is more compact than bag-of-words representation and can be processed faster than raw text in sub- sequent automated processes. A binary ...