Paper: A Discriminative Candidate Generator for String Transformations

ACL ID D08-1047
Title A Discriminative Candidate Generator for String Transformations
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2008

String transformation, which maps a source string s into its desirable form t∗, is related to various applications including stemming, lemmatization, and spelling correction. The essential and important step for string trans- formation is to generate candidates to which the given string s is likely to be transformed. This paper presents a discriminative approach for generating candidate strings. We use sub- string substitution rules as features and score them using an L1-regularized logistic regres- sion model. We also propose a procedure to generate negative instances that affect the de- cision boundary of the model. The advantage of this approach is that candidate strings can be enumerated by an efficient algorithm be- cause the processes of string transformation are tractable in the m...