Paper: Arabic OCR Error Correction Using Character Segment Correction Language Modeling And Shallow Morphology

ACL ID W06-1648
Title Arabic OCR Error Correction Using Character Segment Correction Language Modeling And Shallow Morphology
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2006
Authors

This paper explores the use of a character segment based character correction model, language modeling, and shallow morphology for Arabic OCR error cor- rection. Experimentation shows that character segment based correction is su- perior to single character correction and that language modeling boosts correction, by improving the ranking of candidate corrections, while shallow morphology had a small adverse effect. Further, given sufficiently large corpus to extract a dictionary and to train a language model, word based correction works well for a morphologically rich language such as Arabic.