Paper: Adaptive Sentence Boundary Disambiguation

ACL ID A94-1013
Title Adaptive Sentence Boundary Disambiguation
Venue Applied Natural Language Processing Conference
Session Main Conference
Year 1994

Labeling of sentence boundaries is a nec- essary prerequisite for many natural lan- guage processing tasks, including part-of- speech tagging and sentence alignment. End-of-sentence punctuation marks are ambiguous; to disambiguate them most systems use brittle, special-purpose regular expression grammars and exception rules. As an alternative, we have developed an ef- ficient, trainable algorithm that uses a lex- icon with part-of-speech probabilities and a feed-forward neural network. This work demonstrates the feasibility of using prior probabilities of part-of-speech assignments, as opposed to words or definite part-of- speech assignments, as contextual infor- mation. After training for less than one minute, the method correctly labels over 98.5% of sentence boundaries in a corpus of ov...