Paper: Partially Distribution-Free Learning Of Regular Languages From Positive Samples

ACL ID C04-1013
Title Partially Distribution-Free Learning Of Regular Languages From Positive Samples
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2004
Authors

Regular languages are widely used in NLP to- day in spite of their shortcomings. E cient algorithms that can reliably learn these lan- guages, and which must in realistic applications only use positive samples, are necessary. These languages are not learnable under traditional distribution free criteria. We claim that an ap- propriate learning framework is PAC learning where the distributions are constrained to be generated by a class of stochastic automata with support equal to the target concept. We discuss how this is related to other learning paradigms. We then present a simple learning algorithm for regular languages, and a self-contained proof that it learns according to this partially distri- bution free criterion.