首页> 美国卫生研究院文献>Bioinformatics >LCA*: an entropy-based measure for taxonomic assignment within assembled metagenomes
【2h】

LCA*: an entropy-based measure for taxonomic assignment within assembled metagenomes

机译:LCA *:组装后的基因组中分类分配的基于熵的度量

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Motivation: A perennial problem in the analysis of environmental sequence information is the assignment of reads or assembled sequences, e.g. contigs or scaffolds, to discrete taxonomic bins. In the absence of reference genomes for most environmental microorganisms, the use of intrinsic nucleotide patterns and phylogenetic anchors can improve assembly-dependent binning needed for more accurate taxonomic and functional annotation in communities of microorganisms, and assist in identifying mobile genetic elements or lateral gene transfer events.>Results: Here, we present a statistic called LCA* inspired by Information and Voting theories that uses the NCBI Taxonomic Database hierarchy to assign taxonomy to contigs assembled from environmental sequence information. The LCA* algorithm identifies a sufficiently strong majority on the hierarchy while minimizing entropy changes to the observed taxonomic distribution resulting in improved statistical properties. Moreover, we apply results from the order-statistic literature to formulate a likelihood-ratio hypothesis test and P-value for testing the supremacy of the assigned LCA* taxonomy. Using simulated and real-world datasets, we empirically demonstrate that voting-based methods, majority vote and LCA*, in the presence of known reference annotations, are consistently more accurate in identifying contig taxonomy than the lowest common ancestor algorithm popularized by MEGAN, and that LCA* taxonomy strikes a balance between specificity and confidence to provide an estimate appropriate to the available information in the data.>Availability and Implementation: The LCA* has been implemented as a stand-alone Python library compatible with the MetaPathways pipeline; both of which are available on GitHub with installation instructions and use-cases ().>Contact: >Supplementary information: are available at Bioinformatics online.
机译:>动机:环境序列信息分析中的一个长期问题是读取序列或组合序列的分配,例如重叠群或脚手架,以离散的分类箱。在大多数环境微生物都没有参考基因组的情况下,使用固有的核苷酸模式和系统发生锚可以改善依赖组装的装箱,从而在微生物群落中进行更准确的分类和功能注释,并有助于鉴定可移动的遗传元件或侧向基因转移事件。>结果:在这里,我们介绍了一个受信息和投票理论启发的名为LCA *的统计信息,该统计信息使用NCBI分类数据库层次结构将分类法分配给由环境序列信息组装而成的重叠群。 LCA *算法在层次结构上识别出足够强的多数,同时将对观察到的生物分类分布的熵变最小化,从而改善了统计特性。此外,我们使用来自顺序统计文献的结果来制定似然比假设检验和P值,以检验所分配的LCA *分类法的至上性。使用模拟和现实世界的数据集,我们凭经验证明,在存在已知参考注释的情况下,基于投票的方法(多数投票和LCA *)始终比MEGAN所推广的最低共同祖先算法更准确地识别重叠群分类法,并且LCA *分类法在专一性和置信度之间取得平衡,以提供适合数据中可用信息的估计值。>可用性和实现:LCA *已实现为与Python兼容的独立Python库MetaPathways管道;两者都可以在GitHub上获得,并附带安装说明和用例()。>联系方式: >补充信息:可在Bioinformatics在线获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号