Paper: Legal Docket Classification: Where Machine Learning Stumbles

ACL ID D08-1046
Title Legal Docket Classification: Where Machine Learning Stumbles
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2008

We investigate the problem of binary text clas- sification in the domain of legal docket entries. This work presents an illustrative instance of a domain-specific problem where the state- of-the-art Machine Learning (ML) classifiers such as SVMs are inadequate. Our investiga- tion into the reasons for the failure of these classifiers revealed two types of prominent er- rors which we call conjunctive and disjunctive errors. We developed simple heuristics to ad- dress one of these error types and improve the performance of the SVMs. Based on the in- tuition gained from our experiments, we also developed a simple propositional logic based classifier using hand-labeled features, that ad- dresses both types of errors simultaneously. We show that this new, but simple, approach outperforms all ex...