Paper: Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents

ACL ID D07-1115
Title Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2007
Authors

Recognizing polarity requires a list of po- lar words and phrases. For the purpose of building such lexicon automatically, a lot of studies have investigated (semi-) unsuper- vised method of learning polarity of words and phrases. In this paper, we explore to use structural clues that can extract polar sentences from Japanese HTML documents, and build lexicon from the extracted po- lar sentences. The key idea is to develop the structural clues so that it achieves ex- tremely high precision at the cost of recall. In order to compensate for the low recall, we used massive collection of HTML docu- ments. Thus, we could prepare enough polar sentence corpus.