Paper: The Effect of Corpus Size on Case Frame Acquisition for Discourse Analysis

ACL ID N09-1059
Title The Effect of Corpus Size on Case Frame Acquisition for Discourse Analysis
Venue Human Language Technologies
Session Main Conference
Year 2009
Authors

This paper reports the effect of corpus size on case frame acquisition for discourse analysis in Japanese. For this study, we collected a Japanese corpus consisting of up to 100 bil- lion words, and constructed case frames from corpora of six different sizes. Then, we ap- plied these case frames to syntactic and case structure analysis, and zero anaphora resolu- tion. We obtained better results by using case frames constructed from larger corpora; the performance was not saturated even with a corpus size of 100 billion words.