首页> 外文会议>Australasian Joint Conference on Artificial Intelligence >Hazardous Document Detection Based on Dependency Relations and Thesaurus
【24h】

Hazardous Document Detection Based on Dependency Relations and Thesaurus

机译:基于依赖关系和词库的危险文件检测

获取原文

摘要

In this paper, we propose algorithms to increase the accuracy of hazardous Web page detection by correcting the detection errors of typical keyword-based algorithms based on the dependency relations between the hazardous keywords and their neighboring segments. Most typical text-based filtering systems ignore the context where the hazardous keywords appear. Our algorithms automatically obtain segment pairs that are in dependency relations and appear to characterize hazardous documents. In addition, we also propose a practical approach to expanding segment pairs with a thesaurus. Experiments with a large number of Web pages show that our algorithms increase the detection F value by 7.3% compared to the conventional algorithms.
机译:在本文中,我们提出了通过基于危险关键字和其相邻段之间的依赖关系来校正典型关键字的算法的检测误差来提高危险网页检测的准确性。大多数典型的基于文本的过滤系统忽略了危险关键字出现的上下文。我们的算法自动获取依赖关系的段对,似乎表征了危险文件。此外,我们还提出了一种实用的方法来扩展与词库的段对。与传统算法相比,具有大量网页的实验表明,我们的算法将检测F值增加7.3%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号