Paper: Automatic Construction Of Polarity-Tagged Corpus From HTML Documents

ACL ID P06-2059
Title Automatic Construction Of Polarity-Tagged Corpus From HTML Documents
Venue Annual Meeting of the Association of Computational Linguistics
Session Poster Session
Year 2006
Authors

This paper proposes a novel method of building polarity-tagged corpus from HTML documents. The characteristics of this method is that it is fully automatic and can be applied to arbitrary HTML docu- ments. The idea behind our method is to utilize certain layout structures and lin- guistic pattern. By using them, we can automatically extract such sentences that express opinion. In our experiment, the method could construct a corpus consist- ing of 126,610 sentences.