首页> 外文会议>International conference on algorithms for computational biology >Analysis and Classification of Constrained DNA Elements with N-gram Graphs and Genomic Signatures
【24h】

Analysis and Classification of Constrained DNA Elements with N-gram Graphs and Genomic Signatures

机译:具有N-gram图和基因组签名的受约束DNA元素的分析和分类

获取原文

摘要

Most common methods for inquiring genomic sequence composition, are based on the bag-of-words approach and thus largely ignore the original sequence structure or the relative positioning of its constituent oligonucleotides. We here present a novel methodology that takes into account both word representation and relative positioning at various lengths scales in the form of n-gram graphs (NGG). We implemented the NGG approach on short vertebrate and invertebrate constrained genomic sequences of various origins and predicted functionalities and were able to efficiently distinguish DNA sequences belonging to the same species (intra-species classification). As an alternative method, we also applied the Genomic Signatures (GS) approach to the same sequences. To our knowledge, this is the first time that GS are applied on short sequences, rather than whole genomes. Together, the presented results suggest that NGG is an efficient method for classifying sequences, originating from a given genome, according to their function.
机译:查询基因组序列组成的最常用方法是基于“词袋法”,因此很大程度上忽略了原始序列结构或其组成寡核苷酸的相对位置。我们在这里提出了一种新颖的方法,该方法以n-gram图(NGG)的形式考虑了单词表示和在各种长度范围内的相对位置。我们对各种起源和预期功能的短脊椎动物和无脊椎动物限制的基因组序列实施了NGG方法,并能够有效地区分属于同一物种的DNA序列(物种内分类)。作为一种替代方法,我们还对相同序列应用了基因组签名(GS)方法。据我们所知,这是GS首次应用于短序列,而不是整个基因组。在一起,提出的结果表明,NGG是一种根据序列的功能对源自给定基因组的序列进行分类的有效方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号