Paper: Exponential Reservoir Sampling for Streaming Language Models

ACL ID P14-2112
Title Exponential Reservoir Sampling for Streaming Language Models
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

We show how rapidly changing textual streams such as Twitter can be modelled in fixed space. Our approach is based upon a randomised algorithm called Exponen- tial Reservoir Sampling, unexplored by this community until now. Using language models over Twitter and Newswire as a testbed, our experimental results based on perplexity support the intuition that re- cently observed data generally outweighs that seen in the past, but that at times, the past can have valuable signals enabling better modelling of the present.