首页> 外文期刊>Future generation computer systems >Machine learning based heterogeneous web advertisements detection using a diverse feature set
【24h】

Machine learning based heterogeneous web advertisements detection using a diverse feature set

机译:使用多样化功能集的基于机器学习的异构Web广告检测

获取原文
获取原文并翻译 | 示例
       

摘要

Advertisement identification and filtering in web pages gain significance due to various factors such as accessibility, security, privacy, and obtrusiveness. Current practices in this direction involve maintaining URL-based regular expressions called filter lists. Each URL obtained on a web page is matched against this filter list. While effectual, this procedure lacks scalability as it demands regular continuance of the filter list. To counter these limitations, we devise a machine learning based advertisement detection system using a diverse feature set which can distinguishadvertisement blocksfromnon-advertisement blocks. The method can act as a base to provide various accessibility-related features like smooth browsing and text summarization for persons with visual impairments, cognitive impairments, and photosensitive epilepsy. The results from a classifier trained on the proposed feature set achieve 98.6% accuracy in identifying advertisements.
机译:网页中的广告识别和过滤由于诸如可访问性,安全性,隐私性和干扰性等各种因素而变得重要。这方面的当前实践涉及维护称为过滤器列表的基于URL的正则表达式。网页上获得的每个URL都与此过滤器列表匹配。尽管有效,但此过程缺乏可伸缩性,因为它需要定期连续过滤列表。为了克服这些限制,我们设计了一种使用多样化功能集的基于机器学习的广告检测系统,该功能可以区分广告块和非广告块。该方法可以充当基础,为视力障碍,认知障碍和光敏性癫痫患者提供各种与可访问性相关的功能,例如流畅的浏览和文本摘要。经过分类训练的分类器结果在识别广告方面达到了98.6%的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号