首页> 外文期刊>Computational Biology and Bioinformatics, IEEE/ACM Transactions on >Improving Retrieval Efficacy of Homology Searches Using the False Discovery Rate
【24h】

Improving Retrieval Efficacy of Homology Searches Using the False Discovery Rate

机译:使用错误发现率提高同质检索的检索效率

获取原文
获取原文并翻译 | 示例
           

摘要

Over the past few decades, discovery based on sequence homology has become a widely accepted practice. Consequently, comparative accuracy of retrieval algorithms (e.g., BLAST) has been rigorously studied for improvement. Unlike most components of retrieval algorithms, the E-value threshold criterion has yet to be thoroughly investigated. An investigation of the threshold is important as it exclusively dictates which sequences are declared relevant and irrelevant. In this paper, we introduce the false discovery rate (FDR) statistic as a replacement for the uniform threshold criterion in order to improve efficacy in retrieval systems. Using NCBI’s BLAST and PSI-BLAST software packages, we demonstrate the applicability of such a replacement in both non-iterative (BLAST) and iterative (PSI-BLAST) homology searches. For each application, we performed an evaluation of retrieval efficacy with five different multiple testing methods on a large training database. For each algorithm, we choose the best performing method, Benjamini-Hochberg, as the default statistic. As measured by the threshold average precision, BLAST yielded 14.1 percent better retrieval performance than BLAST on a large (5,161 queries) test database and PSI-BLAST attained 1- .8 percent better retrieval performance than PSI-BLAST. The C++ source code specific to BLAST and PSI-BLAST and instructions are available at http://www.cs.mtsu.edu/~hcarroll/blast_fdr/.
机译:在过去的几十年中,基于序列同源性的发现已成为广泛接受的实践。因此,已经严格研究了检索算法(例如,BLAST)的比较精度以进行改进。与检索算法的大多数组件不同,E值阈值标准尚待深入研究。对阈值的研究很重要,因为它专门指出哪些序列被声明为相关和不相关。在本文中,我们引入了错误发现率(FDR)统计信息来代替统一阈值标准,以提高检索系统的效率。使用NCBI的BLAST和PSI-BLAST软件包,我们证明了这种替代方法在非迭代(BLAST)和迭代(PSI-BLAST)同源性搜索中的适用性。对于每个应用程序,我们在大型培训数据库上使用五种不同的多种测试方法对检索效率进行了评估。对于每种算法,我们选择性能最佳的方法Benjamini-Hochberg作为默认统计量。通过阈值平均精度衡量,在大型(5,161个查询)测试数据库上,BLAST的检索性能比BLAST好14.1%,PSI-BLAST的检索性能比PSI-BLAST好1-.8%。特定于BLAST和PSI-BLAST的C ++源代码及其说明可在http://www.cs.mtsu.edu/~hcarroll/blast_fdr/获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号