Paper: Symbolic Word Clustering For Medium-Size Corpora

ACL ID C96-1083
Title Symbolic Word Clustering For Medium-Size Corpora
Venue International Conference on Computational Linguistics
Session Main Conference
Year 1996
Authors
  • Benoit Habert (Ecole Normale Superieure de Fontenay-St. Cloud, Fontena-aux-Roses France)
  • Elie Naulleau (Ecole Normale Superieure de Fontenay-St. Cloud, Fontena-aux-Roses France; Electricity of France (EDF) Research Center, France)
  • Adeline Nazarenko

When trying to identify essential con- cepts and relationships in a medium-size corpus, it is not always possible to rely on statistical methods, as the frequencies are too low. We present an alternative method, symbolic, based on the simplifi- cation of parse trees. We discuss the re- suits on nominal phrases of two technical corpora, analyzed by two different robust parsers used for terminology updating in an industrial company. We compare our results with Hindle's scores of similarity. Subjects Clustering, ontology development, ro- bust parsing, knowledge acquisition from corpora, computational terminology 1 Identifying word classes in medium-size corpora In companies with a wide range of activities, such as EDF, the French electricity company, the rapid evolution of technical domains, ...