Paper: Identifying Cognates By Phonetic And Semantic Similarity

ACL ID N01-1014
Title Identifying Cognates By Phonetic And Semantic Similarity
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2001
Authors

I present a method of identifying cognates in the vo- cabularies of related languages. I show that a mea- sure of phonetic similarity based on multivalued fea- tures performs better than “orthographic” measures, such as the Longest Common Subsequence Ratio (LCSR) or Dice’s coefficient. I introduce a proce- dure for estimating semantic similarity of glosses that employs keyword selection and WordNet. Tests performed on vocabularies of four Algonquian lan- guages indicate that the method is capable of discov- ering on average nearly 75% percent of cognates at 50% precision.