首页> 美国卫生研究院文献>Nucleic Acids Research >QuartetS: a fast and accurate algorithm for large-scale orthology detection
【2h】

QuartetS: a fast and accurate algorithm for large-scale orthology detection

机译:QuartetS:用于大规模正畸检测的快速准确算法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The unparalleled growth in the availability of genomic data offers both a challenge to develop orthology detection methods that are simultaneously accurate and high throughput and an opportunity to improve orthology detection by leveraging evolutionary evidence in the accumulated sequenced genomes. Here, we report a novel orthology detection method, termed QuartetS, that exploits evolutionary evidence in a computationally efficient manner. Based on the well-established evolutionary concept that gene duplication events can be used to discriminate homologous genes, QuartetS uses an approximate phylogenetic analysis of quartet gene trees to infer the occurrence of duplication events and discriminate paralogous from orthologous genes. We used function- and phylogeny-based metrics to perform a large-scale, systematic comparison of the orthology predictions of QuartetS with those of four other methods [bi-directional best hit (BBH), outgroup, OMA and QuartetS-C (QuartetS followed by clustering)], involving 624 bacterial genomes and >2 million genes. We found that QuartetS slightly, but consistently, outperformed the highly specific OMA method and that, while consuming only 0.5% additional computational time, QuartetS predicted 50% more orthologs with a 50% lower false positive rate than the widely used BBH method. We conclude that, for large-scale phylogenetic and functional analysis, QuartetS and QuartetS-C should be preferred, respectively, in applications where high accuracy and high throughput are required.
机译:基因组数据可用性的无与伦比的增长既给开发同时准确且高通量的正畸检测方法带来了挑战,又为利用积累的测序基因组中的进化证据改善正畸检测提供了机会。在这里,我们报告了一种新颖的正畸检测方法,称为QuartetS,它以计算有效的方式利用了进化证据。基于公认的进化概念,即基因复制事件可用于区分同源基因,QuartetS使用四方基因树的近似系统发生学分析来推断复制事件的发生,并将直系同源基因与旁系同源基因区分开。我们使用基于功能和系统发育的指标对QuartetS的正字学预测与其他四种方法进行了大规模,系统的比较[双向最佳匹配(BBH),outgroup,OMA和QuartetS-C(通过聚类)],涉及624个细菌基因组和> 200万个基因。我们发现QuartetS略微但始终如一地胜过高度专一的OMA方法,并且虽然仅消耗0.5%的额外计算时间,但QuartetS预测直系同源物比广泛使用的BBH方法多50%,而假阳性率低50%。我们得出结论,对于大规模系统发育和功能分析,在需要高精度和高通量的应用中,应分别首选QuartetS和QuartetS-C。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号