Mining Frequent Patterns with Wildcards from Biological Sequences

机译：使用生物序列中的通配符挖掘频繁模式

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Frequent pattern mining from sequences is a crucial step for many domain experts, such as molecular biologists, to discover rules or patterns hidden in their data. In order to find specific patterns, many existing tools require users to specify gap constraints beforehand. In reality, it is often nontrivial to let a user provide such gap constraints. In addition, a change made to the gap values may give completely different results, and require a separate time-consuming re-mining procedure. Consequently, it is desirable to develop an algorithm to automatically and efficiently find patterns without user-specified gap constraints. In this paper, a frequent pattern mining problem without user-specified gap constraints is presented and studied. Given a sequence and a support threshold value, all subsequences whose support is not less than the given threshold value will be discovered. These frequent subsequences then form patterns later on. Two heuristic methods (one-way vs two-way scan) are proposed to mine frequent subsequences and estimate the maximum support for both artificial and real world data. Given a specific pattern, the simulated results demonstrate that the one-way scan heuristic performs better in the sense of estimating the maximum support with more than ninety percent accuracy.

机译：对于许多领域专家（例如分子生物学家）而言，频繁地从序列中进行模式挖掘是发现隐藏在其数据中的规则或模式的关键步骤。为了找到特定的模式，许多现有工具要求用户预先指定间隙约束。实际上，让用户提供这样的间隙约束通常是不平凡的。另外，对间隙值的改变可能给出完全不同的结果，并且需要单独的耗时的重新开采程序。因此，期望开发一种算法来自动且有效地找到模式而没有用户指定的间隙约束。在本文中，提出并研究了一种没有用户指定的间隙约束的频繁模式挖掘问题。给定序列和支持阈值，将发现支持不小于给定阈值的所有子序列。这些频繁的子序列随后会形成模式。提出了两种启发式方法（单向与双向扫描）来挖掘频繁的子序列，并估计对人工和现实世界数据的最大支持。在给定特定模式的情况下，模拟结果表明，单向扫描启发式算法在以90％以上的精度估算最大支持量方面表现更好。

著录项

来源
《Information Reuse and Integration, 2007 IEEE International Conference on》|1979年|P.329-334|共6页
会议地点 Kent(GB)
作者
He Yu; Wu Xindong; Zhu Xingquan; Arslan Abdullah N.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类工业技术;
关键词

相似文献

外文文献
中文文献
专利

1. PMBC: Pattern mining from biological sequences with wildcard constraints [J] . WuX., ZhuX., HeY., Computers in Biology and Medicine . 2013,第5期

机译：PMBC：从具有通配符约束的生物序列中进行模式挖掘
2. PMBC: Pattern mining from biological sequences with wildcard constraints [J] . WuX., ZhuX., HeY., Computers in Biology and Medicine . 2013,第5期

机译：PMBC：从生物序列与通配符约束的模式开采
3. Frequent patterns mining in multiple biological sequences [J] . ChenL., LiuW. Computers in Biology and Medicine . 2013,第10期

机译：多种生物序列中的频繁模式挖掘
4. Mining Frequent Patterns with Wildcards from Biological Sequences [C] . He, Yu, Wu, . 2007

机译：使用来自生物序列的通配符挖掘频繁模式
5. A top-down approach for mining most specific frequent patterns in biological sequence data. [D] . Zhang, Xiang. 2004

机译：自顶向下的方法，用于挖掘生物序列数据中最特定的频繁模式。
6. SEARCHPATTOOL: a new method for mining the most specific frequent patterns for binding sites with application to prokaryotic DNA sequences [O] . Fathi Elloumi, Martha Nason 2007

机译：SEARCHPATTOOL：一种新的方法用于挖掘最常见的结合位点频繁模式并应用于原核DNA序列
7. Frequent Contiguous Pattern Mining Algorithms for Biological Data Sequences [O] . S. Rajasekaran, D Centre 2015

机译：生物数据序列的频繁连续模式挖掘算法
8. Detecting and Mining Similarities, Differences and Target Patterns in Sequences of Images Using the PFF, LGG and SPNG Approaches [R] . Bourbakis, D. 2004

机译：使用pFF，LGG和spNG方法检测和挖掘图像序列中的相似性，差异和目标模式

Mining Frequent Patterns with Wildcards from Biological Sequences

摘要

著录项

相似文献

相关主题

期刊订阅