Paper: Stochastic Contextual Edit Distance and Probabilistic FSTs

ACL ID P14-2102
Title Stochastic Contextual Edit Distance and Probabilistic FSTs
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

String similarity is most often measured by weighted or unweighted edit distance d(x, y). Ristad and Yianilos (1998) de- fined stochastic edit distance?a probabil- ity distribution p(y | x) whose parame- ters can be trained from data. We general- ize this so that the probability of choosing each edit operation can depend on contex- tual features. We show how to construct and train a probabilistic finite-state trans- ducer that computes our stochastic con- textual edit distance. To illustrate the im- provement from conditioning on context, we model typos found in social media text.