...
首页> 外文期刊>Journal of computational biology: A journal of computational molecular cell biology >Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Applications to Protein Searching
【24h】

Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Applications to Protein Searching

机译:快速简单的字符类和有界的缺口模式匹配,及其在蛋白质搜索中的应用

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The problem of fast exact and approximate searching for a pattern that contains classes of characters and bounded size gaps (CBG) in a text has a wide range of applications, among which a very important one is protein pattern matching (for instance, one PROSITE protein site is associated with the CBG [RK] - x(2,3) - [DE] - x(2,3) - Y, where the brackets match any of the letters inside, and x(2,3) a gap of length between 2 and 3). Currently, the only way to search for a CBG in a text is to convert it into a full regular expression (RE). However, a RE is more sophisticated than a CBG, and searching for it with a RE pattern matching algorithm complicates the search and makes it slow. This is the reason why we design in this article two new practical CBG matching algorithms that are much simpler and faster than all the RE search techniques. The first one looks exactly once at each text character. The second one does not need to consider all the text characters, and hence it is usually faster than the first one, but in bad cases may have to read the same text character more than once. We then propose a criterion based on the form of the CBG to choose a priori the fastest between both. We also show how to search permitting a few mistakes in the occurrences. We performed many practical experiments using the PROSITE database, and all of them show that our algorithms are the fastest in virtually all cases.
机译:快速精确和近似搜索包含文本类别的字符和有界大小间隙(CBG)的模式的问题具有广泛的应用,其中非常重要的一个是蛋白质模式匹配(例如,一种PROSITE蛋白质站点与CBG [RK]-x(2,3)-[DE]-x(2,3)-Y相关联,其中方括号匹配其中的任何字母,并且x(2,3)的间距为长度介于2到3之间)。当前,在文本中搜索CBG的唯一方法是将其转换为完整的正则表达式(RE)。但是,RE比CBG更复杂,并且使用RE模式匹配算法进行搜索会使搜索复杂化并使其变慢。这就是为什么我们在本文中设计两种新的实用CBG匹配算法的原因,它们比所有RE搜索技术都更简单,更快。第一个在每个文本字符上仅显示一次。第二个不需要考虑所有文本字符,因此通常比第一个要快,但是在坏情况下,可能必须多次读取同一文本字符。然后,我们基于CBG的形式提出一个标准,以在两者之间选择最快的先验条件。我们还将展示如何搜索允许出现的一些错误。我们使用PROSITE数据库进行了许多实际实验,所有这些都表明我们的算法在几乎所有情况下都是最快的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号