首页> 外文会议>CCF International Conference on Natural Language Processing and Chinese Computing >A Multi-pattern Matching Algorithm for Chinese-Hmong Mixed Strings
【24h】

A Multi-pattern Matching Algorithm for Chinese-Hmong Mixed Strings

机译:中文-苗族混合字符串的多模式匹配算法

获取原文

摘要

To solve the problem of rapid retrieval of Chinese-Hmong mixed text, a multi-pattern matching algorithm in double-bytes unit combined with the idea of AC algorithm and the mismatch processing strategy of Horspool algorithm is proposed for the Chinese-Hmong mixed strings. In this algorithm, a deterministic finite automaton is constructed based on the pattern-set according to the idea of AC algorithm, and the moving distance of the pattern is calculated by the bad-character rule of the Horspool algorithm, and the text is only traversed once to complete the quick search task of all patterns by using the finite automata. The experimental results show that the proposed algorithm has a good performance in multi-pattern matching for Chinese-Hmong mixed texts in different scale, even for the mixed texts containing more than 100,000 characters, the matching efficiency is also significantly higher than the AC algorithm.
机译:为解决中文-苗族混合文本的快速检索问题,针对中文-苗族混合字符串,提出了一种双字节多模式匹配算法,结合AC算法的思想和Horspool算法的失配处理策略。该算法根据AC算法的思想,基于模式集构造确定性有限自动机,并通过Horspool算法的坏字符规则计算模式的移动距离,仅遍历文本通过使用有限自动机一次即可完成所有模式的快速搜索任务。实验结果表明,所提出的算法在不同比例的汉语-苗族混合文本的多模式匹配中具有良好的性能,即使对于超过100,000个字符的混合文本,匹配效率也明显高于AC算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号