ACL Anthology Network (All About NLP) (beta) The Association Of Computational Linguistics Anthology Network |
ACL ID | E09-1039 |
---|---|
Title | Person Identification from Text and Speech Genre Samples |
Venue | Annual Meeting of The European Chapter of The Association of Computational Linguistics |
Session | Main Conference |
Year | 2009 |
Authors |
|
In this paper, we describe experiments con- ducted on identifying a person using a novel unique correlated corpus of text and audio samples of the person’s communication in six genres. The text samples include essays, emails, blogs, and chat. Audio samples were collected from individual interviews and group discussions and then transcribed to text. For each genre, samples were collected for six top- ics. We show that we can identify the com- municant with an accuracy of 71% for six fold cross validation using an average of 22,000 words per individual across the six genres. For person identification in a particular genre (train on five genres, test on one), an average accuracy of 82% is achieved. For identifica- tion from topics (train on five topics, test on one), a...