首页> 外文会议>Brazilian symposium on bioinformatics >False Discovery Rate for Homology Searches
【24h】

False Discovery Rate for Homology Searches

机译:同源搜索的错误发现率

获取原文

摘要

While many different aspects of retrieval algorithms (e.g., BLAST) have been studied in depth, the method for determining the retrieval threshold has not enjoyed the same attention. Furthermore, with genetic databases growing rapidly, the challenges of multiple testing are escalating. In order to improve search sensitivity, we propose the use of the false discovery rate (FDR) as the method to control the number of irrelevant ("false positive") sequences. In this paper, we introduce BLAST_(FDR), an extended version of BLAST that uses a FDR method for the threshold criterion. We evaluated five different multiple testing methods on a large training database and chose the best performing one, Benjamini-Hochberg, as the default for BLAST_(FDR). BLAST_(FDR) achieves 14.1% better retrieval performance than BLAST on a large (5,161 queries) test database and 26.8% better retrieval score for queries belonging to small superfamilies. Furthermore, BLAST_(FDR) retrieved only 0.27 irrelevant sequences per query compared to 7.44 for BLAST.
机译:虽然已经对检索算法(例如,BLAST)的许多不同方面进行了深入研究,但是用于确定检索阈值的方法并未受到同样的关注。此外,随着基因数据库的迅速发展,多重测试的挑战正在升级。为了提高搜索灵敏度,我们建议使用错误发现率(FDR)作为控制无关(“错误肯定”)序列数量的方法。在本文中,我们介绍了BLAST_(FDR),它是BLAST的扩展版本,它使用FDR方法作为阈值标准。我们在大型培训数据库上评估了五种不同的多种测试方法,并选择了性能最佳的Benjamini-Hochberg作为BLAST_(FDR)的默认方法。在大型(5,161个查询)测试数据库上,BLAST_(FDR)的检索性能比BLAST高14.1%,对于小型超家族的查询,其检索得分高26.8%。此外,相比于BLAST的7.44,BLAST_(FDR)每次查询仅检索到0.27个无关序列。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号