Paper: The Linguistic Structure of English Web-Search Queries

ACL ID D08-1107
Title The Linguistic Structure of English Web-Search Queries
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2008
Authors

Web-search queries are known to be short, but little else is known about their structure. In this paper we investigate the applicability of part-of-speech tagging to typical English- language web search-engine queries and the potential value of these tags for improving search results. We begin by identifying a set of part-of-speech tags suitable for search queries and quantifying their occurrence. We find that proper-nouns constitute 40% of query terms, and proper nouns and nouns together constitute over 70% of query terms. We also show that the majority of queries are noun- phrases, not unstructured collections of terms. We then use a set of queries manually la- beled with these tags to train a Brill tag- ger and evaluate its performance. In addi- tion, we investigate classification of se...