Paper: But Dictionaries Are Data Too

ACL ID H93-1039
Title But Dictionaries Are Data Too
Venue Human Language Technologies
Session Main Conference
Year 1993

Although empiricist approaches to machine translation depend vitally on data in the form of large bilingual cor- pora, bilingual dictionaries are also a source of information. We show how to model at least a part of the information contained in a bilingual dictionary so that we can treat a bilingual dictionary and a bilingual corpus as two facets of a unified collection of data from which to extract values for the parameters of a probabilistic machine translation system. We give an algorithm for obtaining maximum iike- fihood estimates of the parameters of a probabilistic model from this combined data and we show how these param- eters are affected by inclusion of the dictionary for some sample words. There is a sharp dichotomy today between ratio- nalist and empiricist approaches to machi...