This paper describes the use of a novel associative memory neural network architecture to perform unsupervised phrase detection in a large, unstructured, English text corpus. To significantly increase the difficulty associated with processing the text corpus, the network is exposed to over 270 thousand Web pages from the .edu domain with no textual substitution or alteration (for spelling, grammar, etc.). The corpus, consisting of 150M words, is represented as a string of sparse tokens and phrase detection is performed through the use of the unique information theoretic quantity of mutual significance.
展开▼