首页> 外文会议>Information Reuse and Integration, 2007 IEEE International Conference on >Mining Frequent Patterns with Wildcards from Biological Sequences
【24h】

Mining Frequent Patterns with Wildcards from Biological Sequences

机译:使用生物序列中的通配符挖掘频繁模式

获取原文
获取原文并翻译 | 示例

摘要

Frequent pattern mining from sequences is a crucial step for many domain experts, such as molecular biologists, to discover rules or patterns hidden in their data. In order to find specific patterns, many existing tools require users to specify gap constraints beforehand. In reality, it is often nontrivial to let a user provide such gap constraints. In addition, a change made to the gap values may give completely different results, and require a separate time-consuming re-mining procedure. Consequently, it is desirable to develop an algorithm to automatically and efficiently find patterns without user-specified gap constraints. In this paper, a frequent pattern mining problem without user-specified gap constraints is presented and studied. Given a sequence and a support threshold value, all subsequences whose support is not less than the given threshold value will be discovered. These frequent subsequences then form patterns later on. Two heuristic methods (one-way vs two-way scan) are proposed to mine frequent subsequences and estimate the maximum support for both artificial and real world data. Given a specific pattern, the simulated results demonstrate that the one-way scan heuristic performs better in the sense of estimating the maximum support with more than ninety percent accuracy.
机译:对于许多领域专家(例如分子生物学家)而言,频繁地从序列中进行模式挖掘是发现隐藏在其数据中的规则或模式的关键步骤。为了找到特定的模式,许多现有工具要求用户预先指定间隙约束。实际上,让用户提供这样的间隙约束通常是不平凡的。另外,对间隙值的改变可能给出完全不同的结果,并且需要单独的耗时的重新开采程序。因此,期望开发一种算法来自动且有效地找到模式而没有用户指定的间隙约束。在本文中,提出并研究了一种没有用户指定的间隙约束的频繁模式挖掘问题。给定序列和支持阈值,将发现支持不小于给定阈值的所有子序列。这些频繁的子序列随后会形成模式。提出了两种启发式方法(单向与双向扫描)来挖掘频繁的子序列,并估计对人工和现实世界数据的最大支持。在给定特定模式的情况下,模拟结果表明,单向扫描启发式算法在以90%以上的精度估算最大支持量方面表现更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号