Paper: Sentence Boundary Detection and the Problem with the U.S.

ACL ID N09-2061
Title Sentence Boundary Detection and the Problem with the U.S.
Venue Human Language Technologies
Session Short Paper
Year 2009
Authors
  • Dan Gillick (University of California at Berkeley, Berkeley CA)

Sentence Boundary Detection is widely used but often with outdated tools. We discuss what makes it difficult, which features are relevant, and present a fully statistical system, now pub- licly available, that gives the best known er- ror rate on a standard news corpus: Of some 27,000 examples, our system makes 67 errors, 23 involving the word “U.S.”