首页> 外文会议>IEEE International Symposium on Bioinformatics and Bioengineering >Mining Frequent Contiguous Sequence Patterns in Biological Sequences
【24h】

Mining Frequent Contiguous Sequence Patterns in Biological Sequences

机译:挖掘生物序列中的常见连续序列模式

获取原文

摘要

Biological sequences such as DNA and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of more than hundreds of frequent items. In biological sequences analysis (BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with a fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. The experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.
机译:诸如DNA和氨基酸序列的生物序列通常含有大量物品。它们具有连续的序列,通常由数百个频繁的物品组成。在生物序列分析(BSA)中,频繁的连续序列搜索是最重要的操作之一。已经有效地挖掘了许多研究。近年来,基于前缀算法的思想提出了MacOSVSPAN算法,以显着减少其递归过程。然而,算法对于从长生物数据序列挖掘频繁连续序列的效率低。在本文中,我们提出了一种通过构造具有固定长度的生成树的大型生物数据序列中的最大频繁连续序列的有效方法。为了验证所提出的方法的优越性,我们在各种环境中执行实验。实验表明,在检索性能方面,所提出的方法比MacOSvspan更有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号