We introduce a new type of KDD patterns called emerging substrings. In a sequence database, an emerging sub-string (ES) of a data class is a substring which occurs more frequently in that class rather than in other classes. ESs are important to sequence classification as they capture significant contrasts between data classes and provide insights for the construction of sequence classifiers. We propose a suffix tree-based framework for mining ESs, and study the effectiveness of applying one or more pruning techniques in different stages of our ES mining algorithm. Experimental results show that if the target class is of a small population with respect to the whole database, which is the normal scenario in single-class ES mining, most of the pruning techniques would achieve considerable performance gain.
展开▼