Paper: Summarization Of Noisy Documents: A Pilot Study

ACL ID W03-0504
Title Summarization Of Noisy Documents: A Pilot Study
Venue Workshop On Text Summarization
Session
Year 2003
Authors

We investigate the problem of summarizing text documents that contain errors as a result of optical character recognition. Each stage in the process is tested, the error effects analyzed, and possible solutions suggested. Our experimental results show that current approaches, which are developed to deal with clean text, suffer signif- icant degradation even with slight increases in the noise level of a document. We conclude by proposing possible ways of improving the per- formance of noisy document summarization.