首页> 外文期刊>Nucleic Acids Research >ROCker: accurate detection and quantification of target genes in short-read metagenomic data sets by modeling sliding-window bitscores
【24h】

ROCker: accurate detection and quantification of target genes in short-read metagenomic data sets by modeling sliding-window bitscores

机译:摇杆:通过建模滑动窗口Bitscores,精确地检测和定量短读的偏心组数据集中的靶基因

获取原文
获取原文并翻译 | 示例
           

摘要

Functional annotation of metagenomic and metatran-scriptomic data sets relies on similarity searches based on e-value thresholds resulting in an unknown number of false positive and negative matches. To overcome these limitations, we introduce ROCker, aimed at identifying position-specific, most-discriminant thresholds in sliding windows along the sequence of a target protein, accounting for non-discriminative domains shared by unrelated proteins. ROCker employs the receiver operating characteristic (ROC) curve to minimize false discovery rate (FDR) and calculate the best thresholds based on how simulated shotgun metagenomic reads of known composition map onto well-curated reference protein sequences and thus, differs from HMM profiles and related methods. We showcase ROCker using ammonia monooxygenase (amoA) and nitrous oxide reductase (nosZ) genes, mediating oxidation of ammonia and the reduction of the potent greenhouse gas, N2O, to inert N-2, respectively. ROCker typically showed 60-fold lower FDR when compared to the common practice of using fixed e-values. Previously uncounted 'atypical' nosZ genes were found to be two times more abundant, on average, than their typical counterparts in most soil metagenomes and the abundance of bacterial amoA was quantified against the highly-related particulate methane monooxygenase (pmoA). Therefore, ROCker can reliably detect and quantify target genes in short-read metagenomes.
机译:Metagenomic和MetaTran-Scriptomic数据集的功能注释依赖于基于E-Value阈值的相似性搜索,从而产生未知数量的假正数和负匹配。为了克服这些限制,我们引入摇杆,旨在鉴定沿着靶蛋白的序列识别滑动窗口的位置特异性最判别阈值,占不相关蛋白共享的非辨别域。 Rocker采用接收器操作特征(ROC)曲线来最小化虚假发现率(FDR),并根据如何将已知的组合物映射的模拟霰弹枪代理读数读出良好策划的参考蛋白序列,从而计算出最佳阈值,从而不同于HMM曲线和相关的方法。我们使用氨单氧化酶(AmoA)和氧化二氮氧化物还原酶(NoSz)基因展示摇杆,介导氨的氧化和减少有效的温室气体,N 2 O,惰性N-2。与使用固定电子值的常见做法相比,摇臂通常显示60倍以下的FDR。先前未计算的“非典型”牛乳烯基因被发现是比其大多数土壤代理人的典型对应物更丰富的两倍,并且对细菌氨基的丰度定量抵靠高度相关的颗粒甲烷单氧基酶(PMOA)。因此,摇杆可以可靠地检测和量化短读的梅曲线中的靶基因。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号