首页> 外文学位 >Computational genomic signatures and metagenomics.
【24h】

Computational genomic signatures and metagenomics.

机译:计算基因组特征和宏基因组学。

获取原文
获取原文并翻译 | 示例

摘要

Mathematical characterizations of biological sequences form one of the main elements of bioinformatics. In this work, a class of DNA sequence characterization, namely computational genomics signatures, which capture global features of these sequences is used to address emerging computational biology challenges. Because of the species specificity and pervasiveness of genome signatures, it is possible to use these signatures to characterize and identify a genome or a taxonomic unit using a short genome fragment from that source. However, the identification accuracy is generally poor when the sequence model and the sequence distance measure are not selected carefully. We show that the use of relative distance measures instead of absolute metrics makes it possible to obtain better detection accuracy. Furthermore, the use of relative metrics can create opportunities for using more complex models to develop genome signatures, which cannot be used efficiently when conventional distance measures are used.;Using a relative distance measure and a model based on the relative abundance of oligonucleotides in a genome fragment, a novel genome signature was defined. This signature was employed to address a class of metagenomics problems. The metagenomics approach enables sampling and sequencing of a microbial community without isolating and culturing single species. Determining the taxonomic classification of the bacterial species within the microbial community from the mixture of short DNA fragments is a difficult computational challenge. We present supervised and unsupervised algorithms for taxonomic classification of metagenomics data and demonstrate their effectiveness on simulated and real-world data. The supervised algorithm, RAIphy, classifies metagenome fragments of unknown origin by assigning them to the taxa, defined in a signature database of previously sequenced microbial genomes. The signatures in the database are updated iteratively during the classification process. Most metagenomics samples include unidentified species, thus they require clustering. Pseudo-assembly of fragments, followed by clustering of taxa is employed in the unsupervised setting. The signatures developed in this work are more specific-specific and pervasive than any signatures currently available in the literature, and demonstrate the potential and viability of using genome signatures to solve various metagenomics problems as well as other challenges in computational biology.
机译:生物序列的数学表征是生物信息学的主要要素之一。在这项工作中,一类DNA序列表征(即计算基因组学特征)捕获了这些序列的整体特征,用于应对新兴的计算生物学挑战。由于基因组签名的物种特异性和普遍性,可以使用这些签名通过来自该来源的短基因组片段来表征和鉴定基因组或分类单位。然而,当不仔细选择序列模型和序列距离度量时,识别精度通常很差。我们表明,使用相对距离度量代替绝对度量可以获取更好的检测精度。此外,相对度量的使用可以为使用更复杂的模型开发基因组签名创造机会,而在使用常规距离度量时无法有效利用基因组签名;使用相对距离度量和基于寡核苷酸相对丰度的模型基因组片段,定义了一个新的基因组签名。此签名用于解决一类宏基因组学问题。宏基因组学方法无需分离和培养单个物种即可对微生物群落进行采样和测序。从短DNA片段的混合物确定微生物群落内细菌种类的分类学分类是一项艰巨的计算挑战。我们介绍了宏基因组学数据分类学分类的有监督和无监督算法,并证明了它们在模拟和真实世界数据上的有效性。监督算法RAIphy通过将未知来源的元基因组片段分配给分类单元进行分类,该分类单元在先前测序的微生物基因组的特征数据库中定义。在分类过程中,将迭代更新数据库中的签名。大多数宏基因组学样本包括未识别的物种,因此需要聚类。在无监督的环境中使用片段的伪组装,然后再分类群的聚类。与目前文献中可用的任何签名相比,这项工作中开发的签名更具特异性和普遍性,并证明了使用基因组签名解决各种宏基因组学问题以及计算生物学中其他挑战的潜力和可行性。

著录项

  • 作者

    Nalbantoglu, Ozkan Ufuk.;

  • 作者单位

    The University of Nebraska - Lincoln.;

  • 授予单位 The University of Nebraska - Lincoln.;
  • 学科 Engineering Computer.;Biology Bioinformatics.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 200 p.
  • 总页数 200
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号