...
首页> 外文期刊>Pattern Analysis and Applications >Breadth-first search strategies for trie-based syntactic pattern recognition
【24h】

Breadth-first search strategies for trie-based syntactic pattern recognition

机译:基于特里的句法模式识别的广度优先搜索策略

获取原文
获取原文并翻译 | 示例

摘要

Dictionary-based syntactic pattern recognition of strings attempts to recognize a transmitted string X~*, by processing its noisy version, Y, without sequentially comparing Y with every element X in the finite, (but possibly, large) dictionary, H. The best estimate X~+ of X~*, is defined as that element of H which minimizes the generalized Levenshtein distance (GLD) D(X, Y) between X and Y, for all X eH. The non-sequential PR computation of X~+ involves a compact trie-based representation of H. In this paper, we show how we can optimize this computation by incorporating breadth first search schemes on the underlying graph structure. This heuristic emerges from the trie-based dynamic programming recursive equations, which can be effectively implemented using a new data structure called the linked list of prefixes that can be built separately or "on top of" the trie representation of H. The new scheme does not restrict the number of errors in Y to be merely a small con- stant, as is done in most of the available methods. The main contribution is that our new approach can be used for generalized GLDs and not merely for 0/1 costs. It is also applicable when all possible correct candidates need to be known, and not just the best match. These constitute the cases when the "cutoffs" cannot be used in the DFS trie-based technique (Shang and Merrettal in IEEE Trans Knowl Data Eng 8(4):540-547, 1996). The new technique is compared with the DFS trie-based technique (Risvik in United Patent 6377945 Bl, 23 April 2002; Shang and Merrettal in IEEE Trans Knowl Data Eng 8(4):540-547, 1996) using three large and small benchmark dictionaries with different errors. In each case, we demonstrate marked improvements with regard to the operations needed up to 21%, while at the same time maintaining the same accuracy. Additionally, some further improvements can be obtained by introducing the knowledge of the maximum number or percentage of errors in Y.
机译:基于字典的基于字符串的句法模式识别尝试通过处理其嘈杂的版本Y来识别传输的字符串X〜*,而无需依次将Y与有限(但可能很大)字典H中的每个元素X进行比较。 X〜*的X〜+估计值定义为H的元素,它对所有X eH最小化X和Y之间的广义Levenshtein距离(GLD)D(X,Y)。 X〜+的非顺序PR计算涉及基于H的紧凑的基于Trie的表示。在本文中,我们展示了如何通过在基础图结构上合并广度优先搜索方案来优化此计算。这种启发式方法是基于基于trie的动态编程递归方程式,可以使用称为前缀链接列表的新数据结构有效地实现,该数据结构可以单独构建,也可以在H的trie表示“之上”构建。不能像大多数可用方法中那样将Y中的错误数量限制为很小的常数。主要贡献在于,我们的新方法可用于广义GLD,而不仅用于0/1成本。当需要知道所有可能的正确候选者而不仅仅是最佳匹配时,它也适用。这些构成了不能在基于DFS Trie的技术中使用“临界值”的情况(Shang和Merrettal,IEEE Trans Knowl Data Eng 8(4):540-547,1996)。使用三个大型基准和小型基准,将该新技术与基于DFS trie的技术(Risvik在美国专利6377945 B1中,2002年4月23日; Shang和Merrettal在IEEE Trans Knowl Data Eng 8(4):540-547中,1996年)进行了比较。带有不同错误的字典。在每种情况下,我们都展示了显着的改进,所需的操作最多可提高21%,同时保持相同的精度。此外,通过引入有关Y中最大错误数或错误百分比的知识,可以获得一些进一步的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号