Paper: Extracting Word Correspondences From Bilingual Corpora Based On Word Co-Occurrence Information

ACL ID C96-1006
Title Extracting Word Correspondences From Bilingual Corpora Based On Word Co-Occurrence Information
Venue International Conference on Computational Linguistics
Session Main Conference
Year 1996
Authors

A new method has been developed for extracting word correspondences from a bilingual corpus. First, the co-occurrence infi~rmation for each word in both languages is extracted li'om the corpus. Then, the correlations between the co-occurrence features of the words are calculated pairwisely with tile assistance of a basic word bilingual dictionary. Finally, the pairs of words with the highest correlations are output selectively. This method is applicable to rather small, unaligned corpora; it can extract correspondences between compound words as well as simple words. An experiment using bilingual patent-specification corpora achieved 28% recall and 76% precision; this demonstrates that the method effectively reduces the cost of bilingual dictionary augmentation.