首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >Discovery of Delta Closed Patterns and Noninduced Patterns from Sequences
【24h】

Discovery of Delta Closed Patterns and Noninduced Patterns from Sequences

机译:从序列中发现Delta闭合模式和非诱导模式

获取原文
获取原文并翻译 | 示例
           

摘要

Discovering patterns from sequence data has significant impact in many aspects of science and society, especially in genomics and proteomics. Here we consider multiple strings as input sequence data and substrings as patterns. In the real world, usually a large set of patterns could be discovered yet many of them are redundant, thus degrading the output quality. This paper improves the output quality by removing two types of redundant patterns. First, the notion of delta tolerance closed itemset is employed to remove redundant patterns that are not delta closed. Second, the concept of statistically induced patterns is proposed to capture redundant patterns which seem to be statistically significant yet their significance is induced by their strong significant subpatterns. It is computationally intense to mine these nonredundant patterns (delta closed patterns and noninduced patterns). To efficiently discover these patterns in very large sequence data, two efficient algorithms have been developed through innovative use of suffix tree. Three sets of experiments were conducted to evaluate their performance. They render excellent results when applying to genomics. The experiments confirm that the proposed algorithms are efficient and that they produce a relatively small set of patterns which reveal interesting information in the sequences.
机译:从序列数据中发现模式对科学和社会的许多方面都具有重大影响,尤其是在基因组学和蛋白质组学方面。在这里,我们将多个字符串视为输入序列数据,并将子字符串视为模式。在现实世界中,通常会发现大量模式,但其中许多模式是多余的,从而降低了输出质量。本文通过消除两种类型的冗余模式来提高输出质量。首先,采用增量公差封闭项集的概念来删除不是增量封闭的冗余模式。其次,提出了统计归纳模式的概念来捕获似乎具有统计学意义的冗余模式,但是其重要性是由其强大的有效子模式引起的。挖掘这些非冗余模式(δ闭合模式和非诱导模式)的计算量很大。为了在非常大的序列数据中有效发现这些模式,通过后缀树的创新使用,开发了两种有效算法。进行了三组实验以评估其性能。当应用于基因组学时,它们可提供出色的结果。实验证实了所提出的算法是有效的,并且它们产生了相对较小的模式集,其揭示了序列中的有趣信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号