Paper: Recognizing Sublanguages in Scientific Journal Articles through Closure Properties

ACL ID W13-1909
Title Recognizing Sublanguages in Scientific Journal Articles through Closure Properties
Venue Workshop on Biomedical Natural Language Processing
Session
Year 2013
Authors

It has long been realized that sublanguages are relevant to natural language process- ing and text mining. However, practical methods for recognizing or characterizing them have been lacking. This paper de- scribes a publicly available set of tools for sublanguage recognition. Closure proper- ties are used to assess the goodness of fit of two biomedical corpora to the sublan- guage model. Scientific journal articles are compared to general English text, and it is shown that the journal articles fit the sublanguage model, while the general En- glish text does not. A number of examples of implications of the sublanguage char- acteristics for natural language processing are pointed out. The software is made pub- licly available at [edited for anonymiza- tion].