Paper: Monolingual And Bilingual Concept Visualization From Corpora

ACL ID N03-4016
Title Monolingual And Bilingual Concept Visualization From Corpora
Venue Human Language Technologies
Session System Demonstration
Year 2003
Authors

‘information space’ based on their occurrences in text corpora, and then allowing a user to visualize local regions of this in- formation space. Words are plotted in a 2-dimensional picture so that related words are close together and whole classes of similar words occur in recognizable clusters which sometimes clearly signify a particular meaning. As well as giving a clear view of which concepts are related in a particular document collection, this technique also helps a user to interpret unknown words. The main technique we will demonstrate is planar pro- jection of word-vectors from a vector space built using Latent Semantic Analysis (LSA) (Landauer and Dumais, 1997; Sch¨utze, 1998), a method which can be applied multilingually if translated corpora are available for train- ing. Fo...