Paper: Classifying Factored Genres with Part-of-Speech Histograms

ACL ID N09-2044
Title Classifying Factored Genres with Part-of-Speech Histograms
Venue Human Language Technologies
Session Short Paper
Year 2009
Authors

This work addresses the problem of genre classification of text and speech transcripts, with the goal of handling genres not seen in training. Two frameworks employing differ- ent statistics on word/POS histograms with a PCA transform are examined: a single model for each genre and a factored representation of genre. The impact of the two frameworks on the classification of training-matched and new genres is discussed. Results show that the factored models allow for a finer-grained rep- resentation of genre and can more accurately characterize genres not seen in training.