首页> 外文期刊>International journal of web information systems >On verifying the authenticity of e-commercial crawling data by a semi-crosschecking method
【24h】

On verifying the authenticity of e-commercial crawling data by a semi-crosschecking method

机译:通过半交叉核对方法验证电子商务爬网数据的真实性

获取原文
获取原文并翻译 | 示例
           

摘要

Purpose - Data crawling in e-commerce for market research often come with the risk of poor authenticity due to modification attacks. The purpose of this paper is to propose a novel data authentication model for such systems. Design/methodology/approach - The data modification problem requires careful examinations in which the data are re-collected to verify their reliability by overlapping the two datasets. This approach is to use different anomaly detection techniques to determine which data are potential for frauds and to be recollected. The paper also proposes a data selection model using their weights of importance in addition to anomaly detection. The target is to significantly reduce the amount of data in need of verification, but still guarantee that they achieve their high authenticity. Empirical experiments are conducted with real-world datasets to evaluate the efficiency of the proposed scheme. Findings - The authors examine several techniques for detecting anomalies in the data of users and products, which give the accuracy of 80 per cent approximately. The integration with the weight selection model is also proved to be able to detect more than 80 per cent of the existing fraudulent ones while being careful not to accidentally include ones which are not, especially when the proportion of frauds is high. Originality/value - With the rapid development of e-commerce fields, fraud detection on their data, as well as in Web crawling systems is new and necessary for research. This paper contributes a novel approach in crawling systems data authentication problem which has not been studied much.
机译:目的-为了进行市场研究而在电子商务中进行数据爬网通常会带来由于修改攻击而导致真实性差的风险。本文的目的是为此类系统提出一种新颖的数据认证模型。设计/方法/方法-数据修改问题需要仔细检查,在其中重新收集数据以通过重叠两个数据集来验证其可靠性。这种方法是使用不同的异常检测技术来确定哪些数据可能存在欺诈行为并进行重新收集。本文还提出了使用数据权重的重要性以及异常检测的数据选择模型。目标是显着减少需要验证的数据量,但仍要保证它们具有很高的真实性。对真实数据集进行了实证实验,以评估所提出方案的效率。调查结果-作者研究了几种检测用户和产品数据异常的技术,这些技术的准确度约为80%。事实证明,与权重选择模型的集成可以检测到80%以上的现有欺诈行为,同时要注意不要意外地将不存在的欺诈行为包括在内,特别是在欺诈率很高的情况下。原创性/价值-随着电子商务领域的飞速发展,对其数据以及Web爬网系统中的欺诈检测是新的并且是研究所必需的。本文为爬网系统数据身份验证问题提供了一种新颖的方法,但尚未对此进行深入研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号