Paper: Hierarchical Clustering Of Words

ACL ID C96-2212
Title Hierarchical Clustering Of Words
Venue International Conference on Computational Linguistics
Session Main Conference
Year 1996
  • Akira Ushioda (ATR Interpreting Telecommunications Research Laboratories, Kyoto Japan)

This plq)er (lescril)es a (hit i~-(triven nlet, hod for hiera, rchicM chlstering of words ill whicii a, la, rge vo(:aJ)ul~ry of I,;ii. glis]'l words is (:histered botl;oln--uf) > with resl)e(:t 1,o (:orpor;~ ranghig in size fi'otn 5 to 50 nlillion wor(ts, using a greedy al gorithm that I;ries I,o nliniluize i~veri~ge lOS8 Of liCllltllal iriforuu:l,l, ion of a, djax:ent classes. The resulting hierar('.hi('al (:illS- tiers of woMs are then tumirMly 1,rans- rorlned to a bit-string representld, ion of (i.e. word bits for) all the words ill the vo- cabulary, Introducing wor(l bits hito i.he ATI I)ecision-Tree DOS Tagger is shown to signific~mt,ly reduce l, he ti~gging error rld;e. PortM)ility of word t)il.s h:om Olle (tonlMn to i~Hotilel: iS ~tlSO diss(:ussed.