首页> 中文期刊> 《通信技术》 >适用于PDF文本内容的高效模式匹配算法

适用于PDF文本内容的高效模式匹配算法

         

摘要

高效、准确地对PDF文档文本内容中的敏感信息脱敏,成功的关键在于敏感词的有效匹配.因此,对经典单模式匹配BM算法、QS算法进行研究分析,结合PDF文本内容编码的规则,提出了一种适用于PDF文档的模式匹配算法.该算法利用BM算法的坏字符表的计算规则、QS算法的下一字符思想,结合已匹配的信息及PDF编码规则,使其最大跳跃距离为m+4,减少了匹配次数,提高了匹配效率.分析验证表明,该算法匹配效率相对于BM算法、QS算法有一定提高.%In order to efficiently and accurately desensitize the sensitive information in the content of PDF document, the key to success lies in the effective matching of the sensitive words. Therefore, based on the research and analysis of classic single-pattern matching BM algorithm and QS algorithm, and in combination with the rules of PDF text content encoding, a pattern matching algorithm suitable for PDF documents is proposed. The algorithm takes advantage of the calculation rules of the bad character table of BM algorithm and the next character idea of QS algorithm, combined with the matched information and PDF coding rules, makes a maximum jump distance of m+4, thus to reduce the number of matching and improves the matching efficiency. The analysis and verification indicate that the matching efficiency of the algorithm is better than that of the BM algorithm and the QS algorithm.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号