Paper: WikiWars: A New Corpus for Research on Temporal Expressions

ACL ID D10-1089
Title WikiWars: A New Corpus for Research on Temporal Expressions
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2010
Authors

The reliable extraction of knowledge from text requires an appropriate treatment of the time at which reported events take place. Unfortu- nately, there are very few annotated data sets that support the development of techniques for event time-stamping and tracking the progres- sion of time through a narrative. In this paper, we present a new corpus of temporally-rich documents sourced from English Wikipedia, which we have annotated with TIMEX2 tags. The corpus contains around 120000 tokens, and 2600 TIMEX2 expressions, thus compar- ing favourably in size to other existing corpora used in these areas. We describe the prepa- ration of the corpus, and compare the profile of the data with other existing temporally an- notated corpora. We also report the results obtained when we use DANTE, our...