Paper: Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model

ACL ID D11-1057
Title Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2011
Authors

5This view is adopted by some morphological theorists (Albright, 2002; Chan, 2006), although see Appendix E.2 for a caution about syncretism. Note that when the lemma is unobserved, the other forms do still influence one another indirectly. 3. For each lexeme, choose a distribution over its inflections. 4. For each lexeme, choose a paradigm that will be used to express the lexeme orthographically. Details are given later. Briefly, step 1 samples vectorθ from a Gaussian prior. Step 2 samples a distribution from a Dirichlet process. This chooses a countable number of lexemes to have positive probability in the language, and decides which ones are most common. Step 3 samples a distribution from a Dirichlet. For the lexeme a127a116a134a104a146a105a132a110a134a107, this might choose to make 1s...