Paper: K-Best Suffix Arrays

ACL ID N07-2005
Title K-Best Suffix Arrays
Venue Human Language Technologies
Session Short Paper
Year 2007

Suppose we have a large dictionary of strings. Each entry starts with a figure of merit (popularity). We wish to find the k- best matches for a substring, s, in a dicti- noary, dict. That is, grep s dict | sort –n | head –k, but we would like to do this in sublinear time. Example applications: (1) web queries with popularities, (2) prod- ucts with prices and (3) ads with click through rates. This paper proposes a novel index, k-best suffix arrays, based on ideas borrowed from suffix arrays and kd- trees. A standard suffix array sorts the suffixes by a single order (lexicographic) whereas k-best suffix arrays are sorted by two orders (lexicographic and popularity). Lookup time is between log N and sqrt N. 1 Standard Suffix Arrays This paper will introduce k-best suffix arrays, which are...