Paper: Handling Noisy Training And Testing Data

ACL ID W02-1015
Title Handling Noisy Training And Testing Data
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2002

In the eld of empirical natural language processing, researchers constantly deal with large amounts of marked-up data; whether the markup is done by the researcher or someone else, human nature dictates that it will have errors in it. This paper will more fully characterise the problem and discuss whether and when (and how) to correct the errors. The discussion is illustrated with speci c examples involving function tagging in the Penn treebank.