ACL Anthology Network (All About NLP) (beta) The Association Of Computational Linguistics Anthology Network |
ACL ID | W11-1214 |
---|---|
Title | Unsupervised Alignment of Comparable Data and Text Resources |
Venue | Building and Using Comparable Corpora |
Session | |
Year | 2011 |
Authors |
In this paper we investigate automatic data- text alignment, i.e. the task of automatically aligning data records with textual descrip- tions, such that data tokens are aligned with the word strings that describe them. Our meth- ods make use of log likelihood ratios to esti- mate the strength of association between data tokens and text tokens. We investigate data- text alignment at the document level and at the sentence level, reporting results for sev- eral methodological variants as well as base- lines. We find that log likelihood ratios pro- vide a strong basis for predicting data-text alignment.