Paper: Mining Tables From Large Scale HTML Texts

ACL ID C00-1025
Title Mining Tables From Large Scale HTML Texts
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2000

Table is a very common presentation scheme, but few papers touch on table extraction in text data mining. This paper l'ocuscs on mining tables from large-scale HTML texts. Table filtering, recognition, interpretation, and presentation arc discussed. Heuristic rules and cell similarities arc employed to identify tables. The F-measure ot' table recognition is 86.50%. We also propose an algorithm to capture attribute-value relationships alnong table cells. Finally, more structured data is extracted and presented.