Paper: Empirical Estimates Of Adaptation: The Chance Of Two Noriegas Is Closer To P/2 Than P2

ACL ID C00-1027
Title Empirical Estimates Of Adaptation: The Chance Of Two Noriegas Is Closer To P/2 Than P2
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2000
Authors

Repetition is very common. Adaptive language models, which allow probabilities to change or adapt after seeing just a few words of a text, were introduced in speech recognition to account for text cohesion. Suppose a document mentions Noriega once. What is the chance that he will be mentioned again? if the first instance has probability p, then under standard (bag-of words) independence assumptions, two in- stances ought to have probability p2, but we find the probability is actually closer to p/2. The first men- tion of a word obviously depends on frequency, but surprisingly, the second does not. Adaptation de- pends more on lexical content than fl'equency; there is more adaptation for content words (proper nouns, technical terminology and good keywords for information retrieval), and les...