Paper: From Baby Steps to Leapfrog: How “Less is More” in Unsupervised Dependency Parsing

ACL ID N10-1116
Title From Baby Steps to Leapfrog: How “Less is More” in Unsupervised Dependency Parsing
Venue Human Language Technologies
Session Main Conference
Year 2010
Authors

We present three approaches for unsupervised grammar induction that are sensitive to data complexity and apply them to Klein and Man- ning’s Dependency Model with Valence. The first, Baby Steps, bootstraps itself via iterated learning of increasingly longer sentences and requires no initialization. This method sub- stantially exceeds Klein and Manning’s pub- lished scores and achieves 39.4% accuracy on Section 23 (all sentences) of the Wall Street Journal corpus. The second, Less is More, uses a low-complexity subset of the avail- able data: sentences up to length 15. Focus- ing on fewer but simpler examples trades off quantity against ambiguity; it attains 44.1% accuracy, using the standard linguistically- informed prior and batch training, beating state-of-the-art. Leapfrog, our thir...