Paper: Measuring semantic content in distributional vectors

ACL ID P13-2078
Venue Annual Meeting of the Association of Computational Linguistics
Session Short Paper
Year 2013

Some words are more contentful than oth- ers: for instance, make is intuitively more general than produce and fifteen is more ?precise? than a group. In this paper, we propose to measure the ?semantic con- tent? of lexical items, as modelled by distributional representations. We inves- tigate the hypothesis that semantic con- tent can be computed using the Kullback- Leibler (KL) divergence, an information- theoretic measure of the relative entropy of two distributions. In a task focus- ing on retrieving the correct ordering of hyponym-hypernym pairs, the KL diver- gence achieves close to 80% precision but does not outperform a simpler (linguis- tically unmotivated) frequency measure. We suggest that this result illustrates the rather ?intensional? aspect of distributions.