Paper: Effective Structural Inference For Large XML Documents

ACL ID C02-1069
Title Effective Structural Inference For Large XML Documents
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2002

This paper investigates methods to automatically infer structural information from large XML doc- uments. Using XML as a reference format, we ap- proach the schema generation problem by applica- tion of inductive inference theory. In doing so, we re- view and extend results relating to the search spaces of grammatical inferences for large data set. We evaluate the result of an inference process using the concept of Minimum Message Length. Comprehen- sive experimentation reveals our new hybrid method to be the most e ective for large documents. Finally tractability issues, including scalability analysis, are discussed.