首页> 外文会议>2011 IEEE International Conference on Systems Biology >Genomic signatures for metagenomic data analysis: Exploiting the reverse complementarity of tetranucleotides
【24h】

Genomic signatures for metagenomic data analysis: Exploiting the reverse complementarity of tetranucleotides

机译:用于宏基因组数据分析的基因组特征:利用四核苷酸的反向互补性

获取原文

摘要

Metagenomics studies microbial communities by analyzing their genomic content directly sequenced from the environment. To this aim metagenomic datasets, consisting of many short DNA or RNA fragments, are computationally analyzed using statistical and machine learning methods with the general purpose of binning or taxonomic annotation. Many of these methods act on features derived from the data through a genomic signature, where a typical genomic signature of a fragment is a vector whose entries specify the frequency with which oligonucleotides appear in that fragment. In this article we analyze experimentally the ability of existing genomic signatures to facilitate the discrimination between fragments belonging to different genomes. We also propose new genomic signatures that take into account that fragments can have been sequenced from both strands of a genome; this is achieved by exploiting the reverse complementarity of oligonucleotides. We conduct extensive experiments on in silico sampled genomic fragments in order to assess comparatively the effectiveness of existing genomic signatures and those proposed in this article. Results of the experiments indicate that the direct use of the reverse complementarity of tetranucleotides in the definition of a genome signatures allows to have performances comparable to the best existing signatures using less features. Therefore the proposed genomic signatures provide an alternative set of features for analyzing metagenomic data. Online Supplementary material is available at http://www.cs.ru.nl/∼gori/signature metagenomics/.
机译:元基因组学通过分析直接从环境中测序的微生物基因组含量来研究微生物群落。为此,使用统计和机器学习方法对由许多短DNA或RNA片段组成的宏基因组数据集进行计算分析,其一般目的是分仓或分类注释。这些方法中的许多都通过基因组签名作用于源自数据的特征,其中片段的典型基因组签名是载体,其条目指定寡核苷酸在该片段中出现的频率。在本文中,我们通过实验分析了现有基因组签名促进区分不同基因组片段的能力。我们还提出了新的基因组特征,其中考虑到可以从基因组的两条链上测序出片段。这是通过利用寡核苷酸的反向互补来实现的。我们对计算机采样的基因组片段进行了广泛的实验,以便比较地评估现有基因组签名和本文提出的那些签名的有效性。实验结果表明,在基因组签名定义中直接使用四核苷酸的反向互补性,使其性能可与使用较少特征的最佳现有签名相媲美。因此,所提出的基因组特征提供了用于分析宏基因组数据的一组替代特征。在线补充材料可在http://www.cs.ru.nl/~gori/signature metagenomics /获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号