Paper: A Collaborative Framework For Collecting Thai Unknown Words From The Web

ACL ID P06-2045
Title A Collaborative Framework For Collecting Thai Unknown Words From The Web
Venue Annual Meeting of the Association of Computational Linguistics
Session Poster Session
Year 2006
Authors

We propose a collaborative framework for collecting Thai unknown words found on Web pages over the Internet. Our main goal is to design and construct a Web- based system which allows a group of in- terested users to participate in construct- ing a Thai unknown-word open dictionary. The proposed framework provides sup- porting algorithms and tools for automati- cally identifying and extracting unknown words from Web pages of given URLs. The system yields the result of unknown- word candidates which are presented to the users for verification. The approved unknown words could be combined with the set of existing words in the lexicon to improve the performance of many NLP tasks such as word segmentation, infor- mation retrieval and machine translation. Our framework includes word segmenta- ti...