Paper: Splitting Long or Ill-formed Input for Robust Spoken-language Translation

ACL ID C98-1067
Title Splitting Long or Ill-formed Input for Robust Spoken-language Translation
Venue International Conference on Computational Linguistics
Session Main Conference
Year 1998
Authors

This paper proposes an input-splitting method for translating spoken-language which includes many long or ill-formed expressions. The pro- posed method splits input into well-balanced translation units based on a semantic distance calculation. The splitting is performed dur- ing left-to-right parsing, and does not degrade translation efficiency. The complete translation result is formed by concatenating the partial translation results of each split unit. The pro- posed method can be incorporated into frame- works like TDMT, which utilize left-to-right parsing and a score for a substructure. Experi- mental results show that the proposed method gives TDMT the following advantages: (1) elim- ination of null outputs, (2) splitting of utter- ances into sentences, and (3) robust...