Mining outliers from large datasets is like finding needles in a haystack. Even more challenging is sifting through the dynamic, unstructured, and ever-growing web data for outliers. This paper presents HyCOQ, which is a hybrid algorithm that draws from the power of n-gram-based and word-based systems. Experimental results obtained using embedded motifs without a dictionary show significant improvement over using a domain dictionary irrespective of the type of data used (words, n-grams, or hybrid). Also, there is remarkable improvement in recall with hybrid documents compared to using raw words and n-grams without a domain dictionary.
展开▼