Paper: A Maximum Entropy Approach To Identifying Sentence Boundaries

ACL ID A97-1004
Title A Maximum Entropy Approach To Identifying Sentence Boundaries
Venue Applied Natural Language Processing Conference
Session Main Conference
Year 1997
Authors

We present a trainable model for identify- ing sentence boundaries in raw text. Given a corpus annotated with sentence bound- aries, our model learns to classify each oc- currence of. , ?, and / as either a valid or in- valid sentence boundary. The training pro- cedure requires no hand-crafted rules, lex- ica, part-of-speech tags, or domain-specific information. The model can therefore be trained easily on any genre of English, and should be trainable on any other Roman- alphabet language. Performance is compa- rable to or better than the performance of similar systems, but we emphasize the sim- plicity of retraining for new domains.