首页> 外文会议>International Conference on Fuzzy Systems and Knowledge Discovery >A SVM-based compound-word recognition method in information security
【24h】

A SVM-based compound-word recognition method in information security

机译:基于SVM的信息安全复合词识别方法

获取原文

摘要

With the emergence of mobile Internet, Internet of things and cloud computing, the domain of information security is in a rapid development. As a result, a constant stream of compound-words describing new concepts and new technologies has arisen. However, the existing dictionary does not collect those new compound-words in time, so it cannot identify them correctly. In order to solve this problem, this paper presents a SVM-based compound-word recognition method in information security. The method is based on the outputs of the existing word segmentation system. It constructs adjacent atom-word digraph according to the statistical co-occurrence features and lexical rules. Next, it produces compound-word candidate set through deep traverse the digraph by the longest match principle. It further filters the candidate set by using a SVM classifier with the help of domain contrast corpus and computer dictionary. We use this method to identify new compound-words from 2200 vulnerability description texts. It achieves a precision of 82.25% and recall of 77.44%. The results show that our method is able to effectively identify new compound-words in information security from large scale of corpus.
机译:随着移动互联网,物联网和云计算的兴起,信息安全领域正在快速发展。结果,出现了不断涌现的描述新概念和新技术的复合词。但是,现有的词典无法及时收集这些新的复合词,因此无法正确识别它们。为了解决这个问题,本文提出了一种基于SVM的信息安全复合词识别方法。该方法基于现有分词系统的输出。它根据统计共现特征和词汇规则构造相邻的原子词有向图。接下来,它通过最长匹配原理通过对图有深度遍历来产生复合词候选集。它借助域对比语料库和计算机词典,通过使用SVM分类器进一步过滤候选集。我们使用此方法从2200个漏洞描述文本中识别新的复合词。它的精度达到82.25%,召回率达到77.44%。结果表明,该方法能够从大规模语料库中有效识别信息安全中的新复合词。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号