Paper: A Morphologically Sensitive Clustering Algorithm For Identifying Arabic Roots

ACL ID P00-1026
Title A Morphologically Sensitive Clustering Algorithm For Identifying Arabic Roots
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2000
Authors

We present a clustering algorithm for Arabic words sharing the same root. Root based clusters can substitute dictionaries in indexing for IR. Modifying Adamson and Boreham (1974), our Two-stage algorithm applies light stemming before calculating word pair similarity coefficients using techniques sensitive to Arabic morphology. Tests show a successful treatment of infixes and accurate clustering to up to 94.06% for unedited Arabic text samples, without the use of dictionaries.