Paper: A Formalism For Universal Segmentation Of Text

ACL ID C00-2095
Title A Formalism For Universal Segmentation Of Text
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2000
  • Julien Quint (Xerox Research Centre Europe, Grenoble France)

Sumo is a formalism for universal segmentation of text. Its purpose is to provide a franlework for the creation of segmentation applications. It is called "universal" as tile formalism itself is independent of the language of the documents to process and independent of the levels of seg- mentation (e.g. words, sentences, paragraphs, nlorphemes)... considered by the target applica- tion. This framework relies on a layered struc- ture representing the possible segmentations of the document. This structure and the tools to manipulate it are described, followed by detailed examples highlighting some features of Sumo.