Paper: Grammatical Analysis By Computer Of The Lancaster-Oslo/Bergen (LOB) Corpus Of British English Texts

ACL ID P85-1036
Title Grammatical Analysis By Computer Of The Lancaster-Oslo/Bergen (LOB) Corpus Of British English Texts
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1985
Authors

Research has been under way at the Unit for Computer Research on the ~hglish Language at the University of Lancaster, England, to develop a suite of computer programs which provide a detailed grammatical analysis of the LOB corpus, a collection of about 1 million words of British English texts available in machine readable form. The first phrase of the pruject, completed in September 1983, produced a grammatically annotated version of the corpus giving a tag showing the word class of each word token. Over 93 per cent of the word tags were correctly selected by using a matrix of tag pair probabilities and this figure was upgraded by a further 3 per cent by retagging problematic strings of words prior to disambiguation and by altering the probability weightings for sequences of three tags. T...