Paper: Improving Automatic Sentence Boundary Detection With Confusion Networks

ACL ID N04-4018
Title Improving Automatic Sentence Boundary Detection With Confusion Networks
Venue Human Language Technologies
Session Short Paper
Year 2004
Authors

We extend existing methods for automatic sen- tence boundary detection by leveraging multi- ple recognizer hypotheses in order to provide robustness to speech recognition errors. For each hypothesized word sequence, an HMM is used to estimate the posterior probability of a sentence boundary at each word boundary. The hypotheses are combined using confusion networks to determine the overall most likely events. Experiments show improved detec- tion of sentences for conversational telephone speech, though results are mixed for broadcast news.