...
首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >EMS3: An Improved Algorithm for Finding Edit-Distance Based Motifs
【24h】

EMS3: An Improved Algorithm for Finding Edit-Distance Based Motifs

机译:EMS3:一种用于查找基于编辑距离的图案的改进算法

获取原文
获取原文并翻译 | 示例
           

摘要

Discovering patterns in biological sequences is a crucial step to extract useful information from them. Motifs can be viewed as patterns that occur exactly or with minor changes across some or all of the biological sequences. Motif search has numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns, similarity among families of proteins, etc. The general problem of motif search is intractable. One of the most studied models of motif search proposed in literature is Edit-distance based Motif Search (EMS). In EMS, the goal is to find all the patterns of length I that occur with an edit-distance of at most d in each of the input sequences. EMS algorithms existing in the literature do not scale well on challenging instances and large datasets. In this paper, the current state-of-the-art EMS solver is advanced by exploiting the idea of dimension reduction. A novel idea to reduce the cardinality of the alphabet is proposed. The algorithm we propose, EMS3, is an exact algorithm. i.e., it finds all the motifs present in the input sequences. EMS3 can be also viewed as a divide and conquer algorithm. In this paper, we provide theoretical analyses to establish the efficiency of EMS3. Extensive experiments on standard benchmark datasets (synthetic and real-world) show that the proposed algorithm outperforms the existing state-of-the-art algorithm (EMS2).
机译:生物序列中的发现模式是从它们中提取有用信息的关键步骤。图案可以被视为模拟,其发生在某些或全部或所有生物序列中完全或具有微小变化。图案搜索有许多应用程序,包括鉴定转录因子及其结合位点,综合调节模式,蛋白质家族之间的相似性等。主题搜索的一般问题是棘手的。文学中提出的最多研究的主题搜索模型之一是基于编辑距离的主题搜索(EMS)。在EMS中,目标是找到在每个输入序列中最多d的编辑距离发生的所有长度图案。在文献中存在的EMS算法在具有挑战性的实例和大型数据集上不符号。在本文中,通过利用尺寸减少的思想来推进当前的最先进的EMS求解器。提出了减少字母基数的新颖思想。我们提出的算法EMS3是一个精确的算法。即,它发现输入序列中存在的所有主题。 EMS3也可以被视为划分和征服算法。在本文中,我们提供了建立EMS3效率的理论分析。在标准基准数据集(综合性和实际世界)的广泛实验表明,所提出的算法优于现有的最先进的算法(EMS2)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号