首页> 外文会议>Computational Intelligence in Bioinformatics and Computational Biology, 2008 IEEE Symp on >An information theoretic approach for the discovery of irregular and repetitive patterns in genomic data
【24h】

An information theoretic approach for the discovery of irregular and repetitive patterns in genomic data

机译:在基因组数据中发现不规则和重复模式的信息理论方法

获取原文

摘要

The unprecedented rate at which genomic data is accumulated underscores the need to develop highly efficient analytical capabilities. Traditionally, most of the effort post-sequencing has been focused on the identification and annotation of genes along with their promoters and regulatory elements. However, a major part of the vastness outside the gene-space is still left unexplored because of a lack of appropriate computational tools. Here, we propose a new approach for exploring and describing a genome without biasing the search process towards already known structural entities. Our primary objective is to discover novel conserved patterns that would typically fall off the scope of the current suite of repeat finding tools because of irregularities in their structure. The output is a hierarchy of patterns with arbitrary structural characteristics. A hierarchical representation captures the genomic sequence content at an abstract level and offers novel ways to examine the information contained in them. Our approach is an information theoretic search process which uses pattern matching techniques for processing the sequence data. Preliminary evaluation on the Drosophila genome has resulted in the finding of a number of irregular patterns. Discovering new patterns is an important problem in both whole- and comparative genomic application domains. The proposed approach can provide an information-theoretic framework for conducting pattern and knowledge discovery on genomic data.
机译:基因组数据的空前积累速度突显了对开发高效分析能力的需求。传统上,大多数后测序工作都集中在基因及其启动子和调控元件的鉴定和注释上。但是,由于缺乏适当的计算工具,基因空间之外的大部分空间仍然未被开发。在这里,我们提出了一种探索和描述基因组的新方法,而不会将搜索过程偏向已知的结构实体。我们的主要目标是发现新颖的保守模式,由于其结构不规则,这些模式通常不在当前重复查找工具套件的范围内。输出是具有任意结构特征的模式层次结构。层次表示法以抽象的级别捕获基因组序列的内容,并提供了新颖的方法来检查其中包含的信息。我们的方法是一种信息理论搜索过程,该过程使用模式匹配技术来处理序列数据。对果蝇基因组的初步评估已发现许多不规则模式。在整个和比较基因组应用领域中,发现新模式都是一个重要问题。所提出的方法可以提供用于在基因组数据上进行模式和知识发现的信息理论框架。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号