Paper: Detecting Subject Boundaries Within Text: A Language Independent Statistical Approach

ACL ID W97-0305
Title Detecting Subject Boundaries Within Text: A Language Independent Statistical Approach
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 1997
Authors

We describe here an algorithm for detect- ing subject boundaries within text based on a statistical lexical similarity measure. Hearst has already tackled this problem with good results (Hearst, 1994). One of her main assumptions is that a change in subject is accompanied by a change in vo- cabulary. Using this assumption, but by introducing a new measure of word signif- icance, we have been able to build a ro- bust and reliable algorithm which exhibits improved accuracy without sacrificing lan- guage independency.