Paper: Cross Language Text Categorization Using a Bilingual Lexicon

ACL ID I08-1022
Title Cross Language Text Categorization Using a Bilingual Lexicon
Venue International Joint Conference on Natural Language Processing
Session Main Conference
Year 2008
Authors

With the popularity of the Internet at a phe- nomenal rate, an ever-increasing number of documents in languages other than English are available in the Internet. Cross lan- guage text categorization has attracted more and more attention for the organization of these heterogeneous document collections. In this paper, we focus on how to con- duct effective cross language text catego- rization. To this end, we propose a cross language naive Bayes algorithm. The pre- liminary experiments on collected document collections show the effectiveness of the pro- posed method and verify the feasibility of achieving performance close to monolingual text categorization, using a bilingual lexicon alone. Also, our algorithm is more ef cient than our baselines.