Paper: Bootstrapping Lexical Choice Via Multiple-Sequence Alignment

ACL ID W02-1022
Title Bootstrapping Lexical Choice Via Multiple-Sequence Alignment
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2002

An important component of any generation system is the mapping dictionary, a lexicon ofelementarysemanticexpressionsandcor- responding natural language realizations. Typically, labor-intensive knowledge-based methods are used to construct the dictio- nary. We instead propose to acquire it automatically via a novel multiple-pass al- gorithm employing multiple-sequence align- ment, a technique commonly used in bioin- formatics. Crucially, our method lever- ages latent information contained in multi- parallel corpora | datasets that supply several verbalizations of the corresponding semantics rather than just one. We used our techniques to generate natural language versions of computer-generated mathematical proofs, with good results on both a per-component and overall-output basis. For examp...