Paper: Monolingual And Bilingual Concept Visualization From Corpora

ACL ID N03-4016
Title Monolingual And Bilingual Concept Visualization From Corpora
Venue Human Language Technologies
Session System Demonstration
Year 2003

‘information space’ based on their occurrences in text corpora, and then allowing a user to visualize local regions of this in- formation space. Words are plotted in a 2-dimensional picture so that related words are close together and whole classes of similar words occur in recognizable clusters which sometimes clearly signify a particular meaning. As well as giving a clear view of which concepts are related in a particular document collection, this technique also helps a user to interpret unknown words. The main technique we will demonstrate is planar pro- jection of word-vectors from a vector space built using Latent Semantic Analysis (LSA) (Landauer and Dumais, 1997; Sch¨utze, 1998), a method which can be applied multilingually if translated corpora are available for train- ing. Fo...