...
首页> 外文期刊>Knowledge and Information Systems >Mining minimal distinguishing subsequence patterns with gap constraints
【24h】

Mining minimal distinguishing subsequence patterns with gap constraints

机译:挖掘具有间隙约束的最小区分子序列模式

获取原文
获取原文并翻译 | 示例
           

摘要

Discovering contrasts between collections of data is an important task in data mining. In this paper, we introduce a new type of contrast pattern, called a Minimal Distinguishing Subsequence (MDS). An MDS is a minimal subsequence that occurs frequently in one class of sequences and infrequently in sequences of another class. It is a natural way of representing strong and succinct contrast information between two sequential datasets and can be useful in applications such as protein comparison, document comparison and building sequential classification models. Mining MDS patterns is a challenging task and is significantly different from mining contrasts between relational/transactional data. One particularly important type of constraint that can be integrated into the mining process is the gap constraint. We present an efficient algorithm called ConSGapMiner (Contrast Sequences with Gap Miner), to mine all MDSs satisfying a minimum and maximum gap constraint, plus a maximum length constraint. It employs highly efficient bitset and boolean operations, for powerful gap-based pruning within a prefix growth framework. A performance evaluation with both sparse and dense datasets, demonstrates the scalability of ConSGapMiner and shows its ability to mine patterns from high dimensional datasets at low supports.
机译:发现数据集合之间的对比是数据挖掘中的重要任务。在本文中,我们介绍了一种新型的对比模式,称为最小区别子序列(MDS)。 MDS是一个最小的子序列,它经常出现在一类序列中,而很少出现在另一类序列中。这是代表两个顺序数据集之间强而简洁的对比信息的自然方法,可用于诸如蛋白质比较,文档比较和建立顺序分类模型等应用。挖掘MDS模式是一项艰巨的任务,与挖掘关系/事务数据之间的对比有很大不同。可以集成到采矿过程中的一种特别重要的约束类型是间隙约束。我们提出一种有效的算法,称为ConSGapMiner(与Gap Miner进行对比的序列),以挖掘满足最小和最大间隙约束以及最大长度约束的所有MDS。它采用高效的位集和布尔运算,以在前缀增长框架内进行基于间隙的强大修剪。对稀疏和密集数据集的性能评估证明了ConSGapMiner的可伸缩性,并显示了其在低支持下从高维数据集挖掘模式的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号