Paper: TRuEcasIng

ACL ID P03-1020
Title TRuEcasIng
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2003
Authors

Truecasing is the process of restoring case information to badly-cased or non- cased text. This paper explores truecas- ing issues and proposes a statistical, lan- guage modeling based truecaser which achieves an accuracy of 98% on news articles. Task based evaluation shows a 26% F-measure improvement in named entity recognition when using truecasing. In the context of automatic content ex- traction, mention detection on automatic speech recognition text is also improved by a factor of 8. Truecasing also en- hances machine translation output legibil- ity and yields a BLEU score improvement of 80:2%. This paper argues for the use of truecasing as a valuable component in text processing applications.