Paper: Reranking with Linguistic and Semantic Features for Arabic Optical Character Recognition

ACL ID P13-2098
Title Reranking with Linguistic and Semantic Features for Arabic Optical Character Recognition
Venue Annual Meeting of the Association of Computational Linguistics
Session Short Paper
Year 2013
Authors

Optical Character Recognition (OCR) sys- tems for Arabic rely on information con- tained in the scanned images to recognize sequences of characters and on language models to emphasize fluency. In this paper we incorporate linguistically and seman- tically motivated features to an existing OCR system. To do so we follow an n-best list reranking approach that exploits recent advances in learning to rank techniques. We achieve 10.1% and 11.4% reduction in recognition word error rate (WER) relative to a standard baseline system on typewrit- ten and handwritten Arabic respectively.