适用于PDF文本内容的高效模式匹配算法

朱玲玉; 王旌舟; 陈庆春

首页> 中文期刊> 《通信技术》 >适用于PDF文本内容的高效模式匹配算法

适用于PDF文本内容的高效模式匹配算法

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

高效、准确地对PDF文档文本内容中的敏感信息脱敏,成功的关键在于敏感词的有效匹配.因此,对经典单模式匹配BM算法、QS算法进行研究分析,结合PDF文本内容编码的规则,提出了一种适用于PDF文档的模式匹配算法.该算法利用BM算法的坏字符表的计算规则、QS算法的下一字符思想,结合已匹配的信息及PDF编码规则,使其最大跳跃距离为m+4,减少了匹配次数,提高了匹配效率.分析验证表明,该算法匹配效率相对于BM算法、QS算法有一定提高.%In order to efficiently and accurately desensitize the sensitive information in the content of PDF document, the key to success lies in the effective matching of the sensitive words. Therefore, based on the research and analysis of classic single-pattern matching BM algorithm and QS algorithm, and in combination with the rules of PDF text content encoding, a pattern matching algorithm suitable for PDF documents is proposed. The algorithm takes advantage of the calculation rules of the bad character table of BM algorithm and the next character idea of QS algorithm, combined with the matched information and PDF coding rules, makes a maximum jump distance of m+4, thus to reduce the number of matching and improves the matching efficiency. The analysis and verification indicate that the matching efficiency of the algorithm is better than that of the BM algorithm and the QS algorithm.

著录项

来源
《通信技术》 |2018年第3期|641-646|共6页
作者
朱玲玉; 王旌舟; 陈庆春;
展开▼
作者单位

西南交通大学,四川成都 611756;

西南交通大学,四川成都 611756;

西南交通大学,四川成都 611756;

展开▼
原文格式 PDF
正文语种 chi
中图分类数据安全;算法理论;
关键词
模式匹配; BM算法; QS算法; PDF编码;

相似文献

中文文献
外文文献
专利

1. 一种面向PDF文本内容审查的高效多模式匹配算法 [J] . 刘邦国 ,陈庆春 ,类先富 . 计算机应用研究 . 2020,第006期
2. 基于自动机理论的PDF文本内容抽取 [J] . 王晓娟 ,谭建龙 ,刘燕兵 . 计算机应用 . 2012,第009期
3. PDF文件文本内容提取研究 [J] . 张秀秀 ,张立峰 . 科技情报开发与经济 . 2008,第036期
4. 如此提取PDF文本内容 [J] . 王志军 . 网友世界 . 2006,第019期
5. 适用于网络内容审计的SSL/TLS保密数据高效明文采集方法 [J] . 董海韬 ,田静 ,杨军 . 计算机应用 . 2015,第010期
6. 基于TCAM的高效入侵检测模式匹配算法 [C] . 周小尧 . 湖南省通信学会第十三届学术年会 . 2010
7. 面向医疗知识的PDF文本内容提取系统设计与实现 [A] . 刘现营 . 2018

适用于PDF文本内容的高效模式匹配算法

摘要

著录项

相似文献

相关主题

期刊订阅