Paper: Using The Web As An Implicit Training Set: Application To Structural Ambiguity Resolution

ACL ID H05-1105
Title Using The Web As An Implicit Training Set: Application To Structural Ambiguity Resolution
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2005
Authors

Recent work has shown that very large corpora can act as training data for NLP algorithms even without explicit labels. In this paper we show how the use of sur- face features and paraphrases in queries against search engines can be used to infer labels for structural ambiguity resolution tasks. Using unsupervised algorithms, we achieve 84% precision on PP-attachment and 80% on noun compound coordination.