Research in generic unsupervised learning of language structure applied to the Search for Extra-Terrestrial Intelligence (SETI) and decipherment of unknown languages has sought to build up a generic picture of lexical and structural patterns characteristic of natural language. As part of this toolkit a generic system is required to facilitate the analysis of behavioral trends amongst selected pairs of terminals and non-terminals alike, regardless of which target natural language was selected. Such a tool may be useful in other areas, such a lexico- grammatical analysis or tagging of corpora. Data-oriented approaches to corpus annotation use statistical n-grams and/or constraint-based models; n-grams or constraints with wider windows can improve error-rates, by examining the topology of the annotation-combination space. We present a visualization tool to help linguists find "useful" PoS-tag combinations, and cohesion between linguistic annotations at other levels; and suggest some possible applications.
展开▼