MINING APPROXIMATE REPEATING PATTERNS FROM SEQUENCE DATA WITH GAP CONSTRAINTS

Dan He; Xingquan Zhu; Xindong Wu

首页> 外文期刊>Computational Intelligence >MINING APPROXIMATE REPEATING PATTERNS FROM SEQUENCE DATA WITH GAP CONSTRAINTS

【24h】

MINING APPROXIMATE REPEATING PATTERNS FROM SEQUENCE DATA WITH GAP CONSTRAINTS

机译：利用GAP约束从序列数据中挖掘近似的重复模式

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The rapid increase of available DNA, protein, and other biological sequences has made the problem of discovering meaningful patterns from sequences an important task for Bioinformatics research. Among all types of patterns defined in the literature, the most challenging one is to find repeating patterns with gap constraints. In this article, we identify a new research problem for mining approximate repeating patterns (ARPs) with gap constraints, where the appearance of a pattern is subject to an approximate match, which is very common in biological sequences. To solve the problem, we propose an ArpGap (ARP mining with Gap constraints) algorithm with three major components for ARP mining: (1) a data-driven pattern generation approach to avoid generating unnecessary candidates for validation; (2) a back-tracking pattern search process to discover approximate occurrences of a pattern under user specified gap constraints; and (3) an Apriori-like deterministic pruning approach to progressively prune patterns and cease the search process if necessary. Experimental results on synthetic and real-world protein sequences assert that ArpGap is efficient in terms of memory consumption and computational cost. The results further suggest that the proposed method is practical for discovering approximate patterns for protein sequences where the sequence length is usually several hundreds to one thousand and the pattern length is relatively short.

机译：可用DNA，蛋白质和其他生物序列的迅速增加，使得从序列中发现有意义的模式成为了生物信息学研究的重要任务。在文献中定义的所有类型的模式中，最具挑战性的一种是找到具有间隙约束的重复模式。在本文中，我们确定了一个新的研究问题，该问题用于挖掘具有间隙约束的近似重复模式（ARP），其中模式的外观受到近似匹配的影响，这在生物序列中非常普遍。为了解决该问题，我们提出了一种ArpGap（具有Gap约束的ARP挖掘）算法，该算法具有ARP挖掘的三个主要组成部分：（1）一种数据驱动的模式生成方法，以避免生成不必要的候选者进行验证；（2）回溯模式搜索过程，以发现在用户指定的间隙约束下模式的近似出现；（3）一种类似Apriori的确定性修剪方法，用于逐渐修剪模式并在必要时停止搜索过程。关于合成和现实世界蛋白质序列的实验结果证明，ArpGap在内存消耗和计算成本方面非常有效。结果进一步表明，所提出的方法对于发现蛋白质序列的近似模式是实用的，其中序列长度通常为几百到一千，并且模式长度相对较短。

著录项

来源
《Computational Intelligence》 |2011年第3期|p.336-362|共27页
作者
Dan He; Xingquan Zhu; Xindong Wu;
展开▼
作者单位

Department of Computer Science, University of California Los Angeles, Los Angeles, California, USA;

Faculty of Engineering & Information Technology, University of Technology, Sydney, Australia,Department of Computer Science & Engineering, Florida Atlantic University, Boca Raton, Florida, USA;

School of Computer Science & Information Engineering, Hefei University of Technology, Hefei, China,Department of Computer Science, University of Vermont, Burlington, Vermont, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
frequent pattern mining; gap constraints; dynamic programming; approximate patterns;

机译：频繁的模式挖掘;差距约束;动态编程近似模式;

相似文献

外文文献
中文文献
专利

1. Mining minimal distinguishing subsequence patterns with gap constraints [J] . Xiaonan Ji, James Bailey, Guozhu Dong Knowledge and information systems . 2007,第3期

机译：挖掘具有间隙约束的最小区分子序列模式
2. Mining minimal distinguishing subsequence patterns with gap constraints [J] . Xiaonan Ji, James Bailey, Guozhu Dong Knowledge and Information Systems . 2007,第3期

机译：挖掘具有间隙约束的最小区分子序列模式
3. MINING EMERGING PATTERNS FROM TIME SERIES DATA WITH TIME GAP CONSTRAINT [J] . Hsieh-Hui Yu, Chun-Hao Chen, Vincent S. Tseng International Journal of Innovative Computing Information and Control . 2011,第9期

机译：具有时间间隔约束的时间序列数据挖掘新兴模式
4. Approximate Repeating Pattern Mining with Gap Requirements [C] . He Dan, Zhu Xingquan, Wu Xindong Tools with Artificial Intelligence, 2009. ICTAI '09 . 2009

机译：具有间隙要求的近似重复模式挖掘
5. A top-down approach for mining most specific frequent patterns in biological sequence data. [D] . Zhang, Xiang. 2004

机译：自顶向下的方法，用于挖掘生物序列数据中最特定的频繁模式。
6. Efficient mining gapped sequential patterns for motifs in biological sequences [O] . Vance Chiang-Chi Liao, Ming-Syan Chen 2013

机译：高效挖掘生物序列中基序的缺口序列模式
7. Mining Frequent Sequential Patterns over Sequence Data Streams with a Gap-Constraint [O] . Joong-Hyuk Chang 2010

机译：使用间隙约束的序列数据流挖掘频繁顺序模式

MINING APPROXIMATE REPEATING PATTERNS FROM SEQUENCE DATA WITH GAP CONSTRAINTS

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅