Paper: Automatic Arabic diacritics restoration based on deep nets

ACL ID W14-3608
Title Automatic Arabic diacritics restoration based on deep nets
Venue Workshop on Arabic Natural Language Processing
Year 2014

In this paper, Arabic diacritics restoration problem is tackled under the deep learn- ing framework presenting Confused Sub- set Resolution (CSR) method to improve the classification accuracy, in addition to Arabic Part-of-Speech (PoS) tagging framework using deep neural nets. Spe- cial focus is given to syntactic diacritiza- tion, which still suffer low accuracy as indicated by related works. Evaluation is done versus state-of-the-art systems re- ported in literature, with quite challeng- ing datasets, collected from different do- mains. Standard datasets like LDC Arab- ic Tree Bank is used in addition to cus- tom ones available online for results rep- lication. Results show significant im- provement of the proposed techniques over other approaches, reducing the syn- tactic classif...