Paper: Language Identification In Unknown Signals

ACL ID C00-2150
Title Language Identification In Unknown Signals
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2000

This paper describes algorithms and software developed to characterise and detect generic intelligent language-like features iu an input signal, using Natural Language Learning techniques: looking for characteristic statistical "language-signatures" in test corpora. As a first step towards such species-independent language-detection, we present a suite of programs to analyse digital representations of a range of data, and use the results to extrapolate whether or not there are language-like structures which distiuguish this data from other sources, such as nmsic, images, and white noise. We assume that generic species- independent commuuication can be detected by concentrating on localised patterns and rhythms, identifying segments at the level of characters, words and phrases, without nec...