Source PaperYearLineSentence
P07-1061 2007 190
More recently, (Purver et al, 2006) hasalso proposed a method for unsupervised topic modeling to address both topic segmentation and identi 486 fication
P07-1061 2007 191
(Purver et al, 2006) is closer to our workthan (Blei and Moreno, 2001) because it does not re quire to build topic models from a corpus but as in our case, its results do not outperform LCseg (Galley et al, 2003) while its model is far more complex
D07-1109 2007 14
Topic models have re cently been applied to information retrieval (Wei and Croft, 2006), text classification (Blei et al, 2003), and dialogue segmentation (Purver et al, 2006)
D08-1035 2008 184
Corpora We evaluate our approach on corpora from two different domains: transcribed meetings and written text.For multi-speaker meetings, we use the ICSI cor pus of meeting transcripts (Janin et al, 2003), which is becoming a standard for speech segmentation (e.g., Galley et al 2003; Purver et al 2006)
D08-1035 2008 225
The results on the meeting corpus also compare favorably with the topic-modeling method of Purver et al (2006), who report a Pk of .289 and a WindowDiff of .329.Another observation from Table 1 is that the con tribution of cue phrases depends on the dataset
D08-1035 2008 42
An alternative Bayesian approach to segmentation was proposed by Purver et al (2006)
I08-2133 2008 8
While having quite different emphasis at dif ferent levels of detail (basically from the point of view of the employed term weighting and/or theadopted inter-block similarity measure), these stud ies analyzed the word distribution inside the texts through the instrumentality of merely one feature, i.e. the one-dimensional inter-block similarity.More recent work use techniques from graph the ory (Malioutov and Barzilay, 2006) and machine learning (Galley et al, 2003; Georgescul et al, 2006; Purver et al, 2006) in order to find patterns in vocabulary use.We investigate new approaches for topic segmen tation on corpora containing multi-party dialogues, which currently represents a relatively less explored domain
W09-2103 2009 24
Such probablistic inference of discourse structure has been used in recent work with HMMs for topic identification (Barzilay & Lee 2004) and related graphical models for segmenting multi-party spoken discourse (Purver et al 2006)
P09-1101 2009 28
While much work in dialogue segmentation centers around topic (e.g. Galley et al. 2003, Hsueh et al. 2006, Purver et al. 2006), we decided to examine dialogue at a more fine grained level
P09-1101 2009 205
(2006) at .283, and Purver et al (2006) at .?284
N09-1042 2009 20
Topic and ContentModels Our work is groundedin topic modeling approaches, which posit that la tent state variables control the generation of words.In earlier topic modeling work such as latent Dirich let alocation (LDA) (Blei et al, 2003; Griffiths and Steyvers, 2004), documents are treated as bags of words, where each word receives a separate topicassignment; the topic assignments are auxiliary vari ables to the main task of language modeling.More recent work has attempted to adapt the concepts of topic modeling to more sophisticated representations than a bag of words; they use these rep resentations to impose stronger constraints on topic assignments (Griffiths et al, 2005; Wallach, 2006; Purver et al, 2006; Gruber et al, 2007)
N09-1042 2009 185
For the segmentation task, we compare to BayesSeg (Eisenstein and Barzilay, 2008),10 a Bayesian topic-based segmentation modelthat outperforms previous segmentation ap proaches (Utiyama and Isahara, 2001; Galley et al, 2003; Purver et al, 2006; Malioutov and Barzilay, 2006)
D10-1005 2010 4
Topic models built on the foundations of LDA are appealing for sentiment analysis becausethe learned topics can cluster together sentiment bearing words, and because topic distributions are a parsimonious way to represent a document.1 LDA has been used to discover latent structure in text (e.g. for discourse segmentation (Purver et al., 2006) and authorship (Rosen-Zvi et al, 2004))
D10-1038 2010 46
(Purver et al, 2006) usesa variant of LDA for the tasks of segmenting meet ing transcripts and extracting the associated topic labels
W12-3205 2012 65
Methods for topic segmenation emply semantic, lexical and referential similarity or, more recently, language models (Bestgen, 2006; Chen et al, 2009; Choi et al, 2001; Eisenstein and Barzilay, 2008; Galley et al, 2003; Hearst, 1997; Malioutov and Barzilay, 2006; Purver et al, 2006; Purver, 2011)
N13-1019 2013 40
PLDA presented byPurver et al2006) is an unsupervised topic mod elling approach for segmentation
N13-1019 2013 42
A binary topic shiftvariable is attached to each text passage (i.e., an utterance in (Purver et al2006))
N13-1019 2013 12
The effectivenessof lexical cohesion has been demonstrated by Text Tiling (Hearst, 1997), c99 (Choi, 2000), MinCut (Malioutov and Barzilay, 2006), PLDA (Purver et al., 2006), Bayesseg (Eisenstein and Barzilay, 2008), TopicTiling (Riedl and Biemann, 2012), etc.Our work uses recent progress in hierarchi cal topic modelling with non-parametric Bayesian methods (Du et al2010; Chen et al2011; Du et al., 2012a), and is based on Bayesian segmentation methods (Goldwater et al2009; Purver et al2006;Eisenstein and Barzilay, 2008) using topic models
N13-1019 2013 106
The first block is (zd,u,n, ?d,u,n) (for each word wd,u,n), which canbe sampled with a table indicator variant of a hier archical topic sampler (Du et al2010), described in Section 4.1. This corresponds to Equation (6) in (Purver et al2006)
N13-1019 2013 107
The second kind of block is a boundary indicator ?d,u together with a particular constrained set of table counts designed to handlesplitting and merging, which corresponds to Equation (7) in (Purver et al2006)
N13-1019 2013 15
Previous work (Purver et al2006; Misra et al2008; Sun et al2008; Misra et al2009; Riedl and Biemann, 2012) has shown that using 190 topic assignments or topic distributions instead ofword frequency can significantly improve segmentation performance
N13-1019 2013 24
Simi larly, the segmentation method of PLDA (Purver et al2006) samples segment boundaries, but also jointly samples a topic model
P14-1004 2014 11
By contrast, modeling?the genre of this paper?is concerned with inferringa phenomena in an existing corpus, such as di alogue acts in two-party conversations (Stolckeet al, 2000) or topic shifts in multi-party dia logues (Galley et al, 2003; Purver et al, 2006; Hsueh et al, 2006; Banerjee and Rudnicky, 2006)