Towards Robust Linguistic Analysis using OntoNotes

Year 2013

Large-scale linguistically annotated cor- pora have played a crucial role in advanc- ing the state of the art of key natural lan- guage technologies such as syntactic, se- mantic and discourse analyzers, and they serve as training data as well as evaluation benchmarks. Up till now, however, most of the evaluation has been done on mono- lithic corpora such as the Penn Treebank, the Proposition Bank. As a result, it is still unclear how the state-of-the-art analyzers perform in general on data from a vari- ety of genres or domains. The completion of the OntoNotes corpus, a large-scale, multi-genre, multilingual corpus manually annotated with syntactic, semantic and discourse information, makes it possible to perform such an evaluation. This paper presents an analysis of the performance of pu...