Paper: Extracting Paraphrases of Technical Terms from Noisy Parallel Software Corpora

ACL ID P09-2050
Title Extracting Paraphrases of Technical Terms from Noisy Parallel Software Corpora
Venue Annual Meeting of the Association of Computational Linguistics
Session Short Paper
Year 2009
Authors

In this paper, we study the problem of ex- tracting technical paraphrases from a par- allel software corpus, namely, a collec- tion of duplicate bug reports. Paraphrase acquisition is a fundamental task in the emerging area of text mining for software engineering. Existing paraphrase extrac- tion methods are not entirely suitable here due to the noisy nature of bug reports. We propose a number of techniques to address the noisy data problem. The empirical evaluation shows that our method signifi- cantly improves an existing method by up to 58%.