首页> 外文会议>International Conference on Parallel, Distributed and Grid Computing >Discovering Motifs in DNA Sequences: A Candidate Motifs Based Approach
【24h】

Discovering Motifs in DNA Sequences: A Candidate Motifs Based Approach

机译:在DNA序列中发现母题:基于候选母题的方法

获取原文

摘要

Motif finding is a classical combinatorial problem in the domain of bioinformatics. Motifs are the small set of immunity gene present in the DNA sequences as a binding site and turn on whenever the organism gets infected. Hence, to identify these motifs for transcription factors is of great biological importance. Recently, this field of bio-informatics has grown significantly and many algorithms have been proposed to solve this problem. However, high complexity is the most challenging aspect of this problem which still grabs the attention of many researchers. This paper presents a proficient algorithm that extracts binding sites in a set of DNA sequences for transcription factors using some operations on the DNA sequences. The motif we work on is of known length, un-gapped and non-mutated. The proposed algorithm does some preprocessing and formulates an adjacency list for finding such sites. Although any two randomly selected sequences can be used for preprocessing, we have used first two sequences as a base for constructing the adjacency lists which is later used for fast detection of common l-mers from both of them. These l-mers are considered as candidate motifs and then checked for its existence in all of the remaining DNA sequences using a sliding window approach. The proposed algorithm CMMF is experimentally validated on millions of DNA sequences. Additionally, the formulation of motif finding algorithm is also applicable to related problems in the field of data mining, pattern detection, etc.
机译:主题发现是生物信息学领域的经典组合问题。基序是存在于DNA序列中作为结合位点的一小部分免疫基因,只要感染了该生物便会打开。因此,鉴定这些转录因子的基序具有重要的生物学意义。近来,生物信息学的领域已显着发展,并且提出了许多算法来解决该问题。但是,高复杂度是此问题最具挑战性的方面,仍然吸引了许多研究人员的注意力。本文提出了一种精巧的算法,该算法使用对DNA序列的某些操作为转录因子提取DNA序列集中的结合位点。我们研究的主题是已知长度的,无间隙且无突变的。所提出的算法进行了一些预处理,并制定了邻接表以查找此类站点。尽管可以将任意两个随机选择的序列用于预处理,但我们已将前两个序列用作构建邻接表的基础,该邻接表随后可用于快速检测这两个序列中的常见I-mer。这些l聚体被认为是候选基序,然后使用滑动窗方法检查其在所有其余DNA序列中是否存在。所提出的算法CMMF在数百万个DNA序列上进行了实验验证。另外,主题发现算法的制定也适用于数据挖掘,模式检测等领域的相关问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号