Paper: Statistical Language Modeling With Performance Benchmarks Using Various Levels Of Syntactic-Semantic Information

ACL ID C04-1167
Title Statistical Language Modeling With Performance Benchmarks Using Various Levels Of Syntactic-Semantic Information
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2004
Authors

Statistical language models using n-gram approach have been under the criticism of neglecting large-span syntactic-semantic in- formation that influences the choice of the next word in a language. One of the ap- proaches that helped recently is the use of latent semantic analysis to capture the se- mantic fabric of the document and enhance the n-gram model. Similarly there have been some approaches that used syntactic analysis to enhance the n-gram models. In this paper, we explain a framework called syntactically enhanced latent semantic anal- ysis and its application in statistical lan- guage modeling. This approach augments each word with its syntactic descriptor in terms of the part-of-speech tag, phrase type or the supertag. We observe that given this syntactic knowledge, the model ou...