首页> 外文期刊>Sadhana: Academy Proceedings in Engineering Science >Detection of spam web page using content and link-based techniques: A combined approach
【24h】

Detection of spam web page using content and link-based techniques: A combined approach

机译:使用内容和基于链接的技术检测垃圾邮件网页:组合方法

获取原文
获取原文并翻译 | 示例
           

摘要

Web spam is a technique through which the irrelevant pages get higher rank than relevant pages in the search engine's results. Spam pages are generally insufficient and inappropriate results for user. Many researchers are working in this area to detect the spam pages. However, there is no universal efficient technique developed so far which can detect all spam pages. This paper is an effort in that direction, where we propose a combined approach of content and link-based techniques to identify the spam pages. The content-based approach uses term density and Part of Speech (POS) ratio test and in the link-based approach, we explore the collaborative detection using personalized page ranking to classify the Web page as spam or non-spam. For experimental purpose, WEBSPAM-UK2006 dataset has been used. The results have been compared with some of the existing approaches. A good and promising F-measure of 75.2% demonstrates the applicability and efficiency of our approach.
机译:网络垃圾邮件是一种技术,通过该技术,无关页面在搜索引擎结果中的排名高于相关页面。垃圾邮件页面通常是不足的,并且对用户而言是不合适的结果。许多研究人员正在该领域中检测垃圾邮件页面。但是,到目前为止,还没有开发出能够检测所有垃圾邮件页面的通用高效技术。本文是朝着这个方向努力的,我们提出了一种结合内容和基于链接的技术来识别垃圾邮件页面的方法。基于内容的方法使用术语密度和词性(POS)比率测试,在基于链接的方法中,我们探索了使用个性化页面排名将网页分类为垃圾邮件还是非垃圾邮件的协作检测。出于实验目的,已使用WEBSPAM-UK2006数据集。将结果与某些现有方法进行了比较。良好且很有希望的F度量为75.2%,证明了我们方法的适用性和效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号