Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Applications to Protein Searching

Gonzalo Navarro; Mathieu Raffinot

首页> 外文期刊>Journal of computational biology: A journal of computational molecular cell biology >Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Applications to Protein Searching

【24h】

Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Applications to Protein Searching

机译：快速简单的字符类和有界的缺口模式匹配，及其在蛋白质搜索中的应用

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The problem of fast exact and approximate searching for a pattern that contains classes of characters and bounded size gaps (CBG) in a text has a wide range of applications, among which a very important one is protein pattern matching (for instance, one PROSITE protein site is associated with the CBG [RK] - x(2,3) - [DE] - x(2,3) - Y, where the brackets match any of the letters inside, and x(2,3) a gap of length between 2 and 3). Currently, the only way to search for a CBG in a text is to convert it into a full regular expression (RE). However, a RE is more sophisticated than a CBG, and searching for it with a RE pattern matching algorithm complicates the search and makes it slow. This is the reason why we design in this article two new practical CBG matching algorithms that are much simpler and faster than all the RE search techniques. The first one looks exactly once at each text character. The second one does not need to consider all the text characters, and hence it is usually faster than the first one, but in bad cases may have to read the same text character more than once. We then propose a criterion based on the form of the CBG to choose a priori the fastest between both. We also show how to search permitting a few mistakes in the occurrences. We performed many practical experiments using the PROSITE database, and all of them show that our algorithms are the fastest in virtually all cases.

机译：快速精确和近似搜索包含文本类别的字符和有界大小间隙（CBG）的模式的问题具有广泛的应用，其中非常重要的一个是蛋白质模式匹配（例如，一种PROSITE蛋白质站点与CBG [RK]-x（2,3）-[DE]-x（2,3）-Y相关联，其中方括号匹配其中的任何字母，并且x（2,3）的间距为长度介于2到3之间）。当前，在文本中搜索CBG的唯一方法是将其转换为完整的正则表达式（RE）。但是，RE比CBG更复杂，并且使用RE模式匹配算法进行搜索会使搜索复杂化并使其变慢。这就是为什么我们在本文中设计两种新的实用CBG匹配算法的原因，它们比所有RE搜索技术都更简单，更快。第一个在每个文本字符上仅显示一次。第二个不需要考虑所有文本字符，因此通常比第一个要快，但是在坏情况下，可能必须多次读取同一文本字符。然后，我们基于CBG的形式提出一个标准，以在两者之间选择最快的先验条件。我们还将展示如何搜索允许出现的一些错误。我们使用PROSITE数据库进行了许多实际实验，所有这些都表明我们的算法在几乎所有情况下都是最快的。

著录项

来源
《Journal of computational biology: A journal of computational molecular cell biology》 |2003年第6期|共21页
作者
Gonzalo Navarro; Mathieu Raffinot;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类普通生物学;
关键词
PROSITE patterns; CBG patterns; extended patterns; wild cards; approximate matching; searching allowing differences; BNDM; boyer-moore; bit-parallelism;

机译：PROSITE模式;CBG模式;扩展模式;通配符;近似匹配;寻找允许差异;BNDM;boyer-moore;位并行;

相似文献

外文文献
中文文献
专利

1. Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Applications to Protein Searching [J] . Gonzalo Navarro, Mathieu Raffinot Journal of computational biology: A journal of computational molecular cell biology . 2003,第6期

机译：快速简单的字符类和有界的缺口模式匹配，及其在蛋白质搜索中的应用
2. Efficient algorithms for pattern matching with general gaps, character classes, and transposition invariance [J] . Kimmo Fredriksson, Szymon Grabowski Information retrieval . 2008,第4期

机译：高效的模式匹配模式，可匹配常规间隙，字符类和转置不变性
3. Faster Pattern Matching With Character Classes Using Prime Number Encoding [J] . Chaim Linhart, Ron Shamir Journal of computer and system sciences . 2009,第3期

机译：使用素数编码更快地与字符类进行模式匹配
4. Fast and simple character classes and bounded gaps pattern matching, with application to protein searching [C] . Gonzalo Navarro, Mathieu Raffinot Annual International Conference on Computational Biology . 2001

机译：快速简单的字符类和有界间隙模式匹配，应用于蛋白质搜索
5. Fast Pattern Matching and its Applications [D] . Ouyang, Wanli 2011

机译：快速模式匹配及其应用
6. The Application of a Pattern Matching Algorithm to Searching Medical Record Text [O] . Peter Nicholas Yianilos, Robert A. Harbort Jr., Samuel R. Buss, 1978

机译：模式匹配算法在病案文本搜索中的应用
7. Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Application to Protein Searching [O] . Gonzalo Navarro, Mathieu Raffinot 2001

机译：快速简单的字符类和有界的缺口模式匹配，在蛋白质搜索中的应用
8. Neuronal Cell Patterning on Covalently Bound Protein Patterns by Micro- Contact Printing Techniques and the Functioning of Proteins Bound on Silane Monolayers [R] . Chun, C. , Hickman, J. J. , Wang, W. , 2004

机译：微接触印刷技术对共价结合蛋白质模式的神经元细胞模式及硅烷单分子膜上蛋白质的功能

Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Applications to Protein Searching

摘要

著录项

相似文献

相关主题

期刊订阅