Paper: Robust Reading: Identification And Tracing Of Ambiguous Names

ACL ID N04-1003
Title Robust Reading: Identification And Tracing Of Ambiguous Names
Venue Human Language Technologies
Session Main Conference
Year 2004
Authors

A given entity, representing a person, a location or an organization, may be mentioned in text in multiple, ambiguous ways. Understanding natural language requires identifying whether different mentions of a name, within and across documents, represent the same entity. We develop an unsupervised learning approach that is shown to resolve accurately the name identification and tracing problem. At the heart of our approach is a generative model of how documents are generated and how names are “sprinkled” into them. In its most general form, our model assumes: (1) a joint distribution over entities, (2) an “author” model, that assumes that at least one mention of an entity in a docu- ment is easily identifiable, and then generates other mentions via (3) an appearance model, governing ...