首页> 中文期刊>高技术通讯 >基于URL文本特征及链接关系的钓鱼网站识别算法

基于URL文本特征及链接关系的钓鱼网站识别算法

     

摘要

为了提高对钓鱼网站的识别准确率,通过对钓鱼网站统一资源定位符(URL)文本数据的分析,结合钓鱼网站内部链接关系组成的网络拓扑结构特征,提出了基于URL文本特征及链接关系的钓鱼网站识别算法FAUFL.该算法的原理是:以URL文本特征作为输入,采用随机森林算法生成基于URL文本特征的钓鱼网站判别算法;以链接关系作为输入构建相关网页群,采用基于最大流切割的相关网页群算法生成基于链接关系的钓鱼网站判别算法;将上述两种判别算法结果作为输入,采用Bagging算法进行进一步评估.测试结果表明钓鱼网站识别算法FAUFL算法的识别准确率为99.2%,比基于URL文本特征的算法的准确率提高3.9%,比基于链接关系的算法提高5.0%.%Based on the analysis of the uniform resource location ( URL) text data of fishing sites and the characteristics of the network topology composed of fishing websites, a fishing site recognition algorithm based on URL text features and link relation ( FAUFL) is proposed to improve the accuracy rate of fishing site recognition.The principle of the algorithm is as below:By using URL text features as input, the random forest algorithm is used to generate the fish-ing site discrimination algorithm based on URL text features.The related web page group is constructed by using the link relation as input, and the related web page algorithm based on the maximum flow cutting is used to gener-ate the fishing website based on the link discriminant algorithm.By taking the above two kinds of discriminant algo-rithms' results as input, the further evaluation is conducted by using the Bagging algorithm.The test results show that the accuracy rate of the FAUFL is 99.2%, which is 3.9% higher than that of the URL text feature-based algo-rithm, and 5.0% higher than that of the link-based algorithm.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号