【24h】

Variable-length intervals in homology search

机译:同源搜索中的可变长度间隔

获取原文
获取原文并翻译 | 示例

摘要

Fast, accurate, and scalable search techniques for homology searching of large genomic collections are becoming an increasingly important requirement as genomic sequence collections continue to double in size almost yearly. Almost all homology search techniques rely on extracting fixed-length overlapping sequences from queries and database sequences, and comparing these as the first step in query evaluation; this is a feature of well-known tools such as FASTA, BLAST, and our own CAFE technique. In this paper we discuss a novel, variable-length approach to extracting subsequences that is based on homology scoring matrices. Our motivation is to achieve a balance between the speed and accuracy of fixed-length choices, that is, to encapsulate the speed of longer subsequence lengths and the accuracy of shorter ones. We show that incorporating this approach into our CAFE technique leads to a good compromise between accuracy and retrieval efficiency when searching with BLOSUM matrices sensitive to distant evolutionary relationships. We expect the same results would be achieved with other homology search techniques.
机译:随着基因组序列集合的规模几乎每年翻一番,用于大型基因组集合的同源性搜索的快速,准确和可扩展的搜索技术正变得越来越重要。几乎所有的同源搜索技术都依赖于从查询和数据库序列中提取固定长度的重叠序列,并将它们进行比较作为查询评估的第一步;这是FASTA,BLAST和我们自己的CAFE技术等知名工具的功能。在本文中,我们讨论了一种基于同源性评分矩阵的新颖的变长方法来提取子序列。我们的动机是在固定长度选择的速度和准确性之间取得平衡,即封装较长子序列长度的速度和较短子序列长度的准确性。我们表明,将这种方法结合到我们的CAFE技术中时,在使用对远距离进化关系敏感的BLOSUM矩阵进行搜索时,会在准确性和检索效率之间取得很好的折衷。我们希望使用其他同源搜索技术也能获得相同的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号