Paper: OCR Post-Processing For Low-Density Languages

ACL ID H05-1109
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2005

We present a lexicon-free post-processing method for optical character recognition (OCR), implemented using weighted fi- nite state machines. We evaluate the technique in a number of scenarios rele- vant for natural language processing, in- cluding creation of new OCR capabilities for low density languages, improvement of OCR performance for a native com- mercial system, acquisition of knowledge from a foreign-language dictionary, cre- ation of a parallel text, and machine trans- lation from OCR output.