Paper: Deterministic Word Segmentation Using Maximum Matching with Fully Lexicalized Rules

ACL ID E14-4016
Title Deterministic Word Segmentation Using Maximum Matching with Fully Lexicalized Rules
Venue Annual Meeting of The European Chapter of The Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

We present a fast algorithm of word seg- mentation that scans an input sentence in a deterministic manner just one time. The algorithm is based on simple max- imum matching which includes execu- tion of fully lexicalized transformational rules. Since the process of rule match- ing is incorporated into dictionary lookup, fast segmentation is achieved. We eval- uated the proposed method on word seg- mentation of Japanese. Experimental re- sults show that our segmenter runs consid- erably faster than the state-of-the-art sys- tems and yields a practical accuracy when a more accurate segmenter or an annotated corpus is available.