首页> 外文会议> >An IDC-based algorithm for efficient homology filtration with guaranteed seriate coverage
【24h】

An IDC-based algorithm for efficient homology filtration with guaranteed seriate coverage

机译:基于IDC的算法,可有效确保同源性过滤并确保序列覆盖

获取原文

摘要

The homology search within genomic databases is a fundamental and crucial work for biological knowledge discovery. With exponentially increasing sizes and accesses of databases, the filtration approach, which filters impossible homology candidates to reduce the time for homology verification, becomes more important in bioinformatics. Most of known gram-based filtration approaches, like QUASAR, in the literature have limited error tolerance and would conduct potentially higher false-positives. In this paper, we present an IDC-based lossless filtration algorithm with guaranteed seriate coverage and error tolerance for efficient homology discovery. In our method, the original work of homology extraction with requested seriate coverage and error levels is transformed to a longest increasing subsequence problem with range constraints, and an efficient algorithm is proposed for the problem in this paper. The experimental results show that the method significantly outperforms QUASAR. On some comparable sensitivity levels, our homology filter would make the discovery more than three orders of magnitude faster than that QUASAR does, and more than four orders faster than the exhaustive search.
机译:基因组数据库内的同源性搜索是生物学知识发现的基础和至关重要的工作。随着数据库的大小和访问量成倍增加,过滤方法可以过滤不可能的同源性候选对象,以减少同源性验证的时间,在生物信息学中变得越来越重要。文献中大多数已知的基于克的过滤方法(如QUASAR)具有有限的错误容忍度,并且会产生潜在较高的假阳性。在本文中,我们提出了一种基于IDC的无损过滤算法,可确保有效的序列覆盖率和容错能力,以实现有效的同源性发现。在我们的方法中,将具有要求的序列覆盖度和错误级别的同源性提取的原始工作转换为具有范围约束的最长的增长子序列问题,并针对该问题提出了一种有效的算法。实验结果表明,该方法明显优于QUASAR。在某些可比较的敏感度水平上,我们的同源性过滤器可使发现比QUASAR快三个数量级,比穷举搜索快四个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号