Paper: Collocation Extraction Using Monolingual Word Alignment Method

ACL ID D09-1051
Title Collocation Extraction Using Monolingual Word Alignment Method
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2009
Authors
  • Zhanyi Liu (Harbin Institute of Technology, Harbin China; Toshiba (China) Research and Development Center, Beijing China)
  • Haifeng Wang (Toshiba (China) Research and Development Center, Beijing China)
  • Hua Wu (Harbin Institute of Technology, Harbin China)
  • Sheng Li

Statistical bilingual word alignment has been well studied in the context of machine trans- lation. This paper adapts the bilingual word alignment algorithm to monolingual scenario to extract collocations from monolingual cor- pus. The monolingual corpus is first repli- cated to generate a parallel corpus, where each sentence pair consists of two identical sentences in the same language. Then the monolingual word alignment algorithm is employed to align the potentially collocated words in the monolingual sentences. Finally the aligned word pairs are ranked according to refined alignment probabilities and those with higher scores are extracted as colloca- tions. We conducted experiments using Chi- nese and English corpora individually. Com- pared with previous approaches, which ...