首页> 外文会议>Chinese Automation Congress >Massive Online Shopping Data Mining based on Hadoop
【24h】

Massive Online Shopping Data Mining based on Hadoop

机译:基于Hadoop的大规模在线购物数据挖掘

获取原文

摘要

Huge volume data generated by human interacting with internet contains useful hidden information, e.g. the online shopping data in electronic commerce platform that reflect people's interests and requirements. To exploit the potential value of massive online shopping data and improve the service of online retailers, this paper proposes an optimized distributed Apriori algorithm using Hadoop platform. The data for the experiment are from the Tianchi data laboratory. In the experiments, we examined the performance of the proposed approach against different number of nodes in Hadoop cluster. Besides, the goods with the highest correlation degree among the massive online shopping data are investigated. The results showed that the optimized Apriori algorithm has the capability to tackle the online shopping data mining task. In particular, increasing the number of nodes of the cluster appropriately can decrease the running time. In addition, when the 120 million online shopping records are used for the experiment, we found that the two goods numbered as 50010200 and 50008168 have the highest correlation degree and the probability reaches 0.00758%. In a total of more than 120 million massive records, this is a relatively high frequency.
机译:人类与互联网互动生成的巨大卷数据包含有用的隐藏信息,例如,电子商务平台的在线购物数据反映了人们的利益和要求。为了利用大规模在线购物数据的潜在价值,提高在线零售商的服务,本文提出了一种使用Hadoop平台的优化分布式APRiori算法。实验的数据来自天池数据实验室。在实验中,我们检查了针对Hadoop集群中不同数量的节点的提出方法的表现。此外,还研究了大规模在线购物数据中具有最高相关程度的货物。结果表明,优化的APRiori算法具有解决在线购物数据挖掘任务的能力。特别地,适当地增加集群的节点数量可以减小运行时间。此外,当新的120万辆在线购​​物记录用于实验时,我们发现编号为50010200和50008168的两个货物具有最高的相关程度,概率达到0.00758%。总共超过1.2亿的巨大记录,这是一个相对高的频率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号