首页> 外文期刊>Future generation computer systems >Cognitive spammer: A Framework for PageRank analysis with Split by Over-sampling and Train by Under-fitting
【24h】

Cognitive spammer: A Framework for PageRank analysis with Split by Over-sampling and Train by Under-fitting

机译:认知垃圾邮件发送者:一种用于PageRank分析的框架,其中包括“过采样”拆分和“欠拟合”培训

获取原文
获取原文并翻译 | 示例

摘要

From the past few years, there is an exponential increase in one of the most popular technologies of the modern era called as Internet of Things (IoT). In IoT, various objects perform the tasks of sensing, communication, and computation for providing uninterrupted services (e.g., e-health, e-transportation, security access, etc.) to the end users. In this era, Cognitive Internet ofThings (CIoT) is an another paradigm of IoT developed to enhance the capabilities of intelligence in IoT objects where these objects can take independent decisions in any environment. IoT follows the service oriented architecture (SOA), in which the application layer is the topmost layer. It enables the IoT objects to interact with the other objects located across the globe. The power of learning, thinking, and understanding by these objects, can make the information access more accurate and reliable but Web spam is one of the challenges while accessing information from the web. It has been observed from the literature review that search engines are preferred mostly by the people for accessing information. The efficient ranking by the search engines can reduce the computational cost of information exchange by IoT objects. Search engines should be able to prevent the spam from being injected into the web. But, the existing techniques for this problem target in finding the spam after its occurrence in search engine result pages. So, in this proposal, we present an intelligent cognitive spammer framework, Cognitive spammer, which eliminates the spam pages during the web page rank score calculation by search engines. The framework update the Google's ranking algorithm, PageRank in such a way that it automatically prevents link spam by considering the link structure of web for rank score computation. The updated PageRank algorithm provided the better ranking of web pages. The proposed framework is validated with the WEBSPAM-UK2007 dataset. Before processing, the dataset is preprocessed with a new technique, called as 'Split by Over-sampling and Train by Under-fitting' to remove the trade off between imbalanced instances of target class. After data cleaning, we applied machine learning techniques (Bagged model, Boosted linear model, etc) with the web page features to make accurate predictions. The detection classifiers only consider the link features of the web page irrespective of the page content. Out of the fifteen classifiers, best three are ensemble, which results in better performance with overall accuracy improvement. Ten-fold cross validation has also been applied with the resulted ensemble model, which results in getting the accuracy of 99.6% in the proposed scheme. (C) 2018 Published by Elsevier B.V.
机译:在过去的几年中,称为物联网(IoT)的现代最流行技术之一呈指数级增长。在物联网中,各种对象执行感测,通信和计算任务,以向最终用户提供不间断的服务(例如,电子保健,电子传输,安全访问等)。在这个时代,认知物联网(CIoT)是IoT的另一个范例,旨在增强IoT对象的智能功能,这些对象可以在任何环境中做出独立决策。物联网遵循面向服务的架构(SOA),其中应用层是最顶层。它使IoT对象能够与全球其他对象进行交互。通过这些对象进行学习,思考和理解的力量可以使信息访问更加准确和可靠,但是从网站访问信息时,网络垃圾邮件是挑战之一。从文献综述中观察到,搜索引擎最受人们青睐,以获取信息。搜索引擎的有效排名可以减少IoT对象交换信息的计算成本。搜索引擎应该能够防止垃圾邮件被注入到网络中。但是,针对该问题的现有技术的目标是在垃圾邮件出现在搜索引擎结果页面中之后查找垃圾邮件。因此,在本提案中,我们提出了一个智能的认知垃圾邮件发送者框架,即认知垃圾邮件发送者,该框架可以在搜索引擎计算网页排名分数时消除垃圾邮件页面。该框架更新了Google的排名算法PageRank,从而通过考虑用于排名得分计算的Web链接结构来自动防止链接垃圾邮件。更新的PageRank算法提供了更好的网页排名。 WEBSPAM-UK2007数据集验证了提出的框架。在处理之前,数据集将使用一种称为“过采样分割和欠拟合训练”的新技术进行预处理,以消除目标类不平衡实例之间的折衷。清理数据后,我们将具有网页功能的机器学习技术(袋装模型,Boosted线性模型等)应用于准确的预测。检测分类器仅考虑网页的链接功能,而与页面内容无关。在这15个分类器中,最好的3个是整体,这会带来更好的性能,并提高整体准确性。所得到的集成模型也已应用了十倍交叉验证,从而在提出的方案中获得了99.6%的准确性。 (C)2018由Elsevier B.V.发布

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号