首页> 中文期刊> 《计算机工程》 >基于URL特征检测的违法网站识别方法

基于URL特征检测的违法网站识别方法

         

摘要

An identification method based on URL feature detection is proposed to effectively identify illegal websites.A website similarity model based on path similarity is designed based on the hierarchical characteristics of user access path in message request line information,and distributed computing of the model is implemented by using Python programming language.Websites clustering is achieved by Fast Unfolding algorithm,and URL features of illegal websites are extracted.The features of high accuracy and specific meaning are selected as effective illegal website features.By detecting whether an unknown website has the URL features of an illegal website to identify illegal websites.Experimental results show that the method can effectively measure the degree of association between similar websites,and can effectively distinguish different types of websites with Fast Unfolding algorithm.Compared with other identifying methods based on URL morphological features,HTML or semantic features,F-Measure value of the proposed method achieves the best result.%为高效识别违法网站,提出一种基于URL特征检测的识别方法.基于报文请求行信息中用户访问路径的分级特点,构建基于路径相似度的网站相似度计算模型,并使用Python编程语言实现模型的分布式计算.采用Fast Unfolding算法进行网站聚类并抽取违法网站的URL特征,从中筛选出准确率高、具有特定含义的特征作为有效的违法网站特征,并通过检测未知网站是否具有违法网站的URL特征识别出违法网站.实验结果证明,该方法能有效度量同类网站间的关联程度,结合Fast Unfolding算法能有效区分不同类型的网站.与基于URL词法特征、HTML、语义特征的违法网站识别方法相比,其F-Measure值最高.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号