Another technique that exploits morphemes as repeating sub-word segments encodes the lexemes of a corpus as a character tree, i.e. trie, (Harris, 1955; Hafer and Weis, 1974), or as a finite state automaton (FSA) over characters (Johnson, H. and Martin, 2003; Altun and M. Johnson, 2001). Johnson and Martin (2003) generalize from character trees and model morphological character sequences with minimized finite state automata. A slightly better method is to compile a set of words into a trie and predict boundaries at nodes with high actitivity (e.g (Johnson and Martin, 2003; Schone and Jurafsky, 2001; Kazakov and Manandhar, 2001) and earlier papers by the same authors), but this not sound either as non-morphemic short common character sequences also show significant branching.