Paper: Recognizing Noisy Romanized Japanese Words in Learner English

ACL ID W08-0904
Title Recognizing Noisy Romanized Japanese Words in Learner English
Venue Innovative Use of NLP for Building Educational Applications
Session
Year 2008
Authors

This paper describes a method for recognizing romanized Japanese words in learner English. They become noise and problematic in a vari- ety of tasks including Part-Of-Speech tagging, spell checking, and error detection because they are mostly unknown words. A problem one encounters when recognizing romanized Japanese words in learner English is that the spelling rules of romanized Japanese words are often violated by spelling errors. To ad- dress the problem, the described method uses a clustering algorithm reinforced by a small set of rules. Experiments show that it achieves an a0 -measure of 0.879 and outperforms other methods. They also show that it only requires the target text and a fair size of English word list.