首页> 外文会议>Chinese Automation Congress >Massive Online Shopping Data Mining Based on Hadoop
【24h】

Massive Online Shopping Data Mining Based on Hadoop

机译:基于Hadoop的海量在线购物数据挖掘

获取原文

摘要

Huge volume data generated by human interacting with internet contains useful hidden information, e.g. the online shopping data in electronic commerce platform that reflect people's interests and requirements. To exploit the potential value of massive online shopping data and improve the service of online retailers, this paper proposes an optimized distributed Apriori algorithm using Hadoop platform. The data for the experiment are from the Tianchi data laboratory. In the experiments, we examined the performance of the proposed approach against different number of nodes in Hadoop cluster. Besides, the goods with the highest correlation degree among the massive online shopping data are investigated. The results showed that the optimized Apriori algorithm has the capability to tackle the online shopping data mining task. In particular, increasing the number of nodes of the cluster appropriately can decrease the running time. In addition, when the 120 million online shopping records are used for the experiment, we found that the two goods numbered as 50010200 and 50008168 have the highest correlation degree and the probability reaches 0.00758%. In a total of more than 120 million massive records, this is a relatively high frequency.
机译:人类与互联网互动产生的大量数据包含有用的隐藏信息,例如反映人们兴趣和需求的电子商务平台中的在线购物数据。为了挖掘海量在线购物数据的潜在价值并改善在线零售商的服务,本文提出了一种使用Hadoop平台的优化分布式Apriori算法。实验数据来自天池数据实验室。在实验中,我们针对Hadoop集群中不同数量的节点检查了所提出方法的性能。此外,还对海量在线购物数据中相关度最高的商品进行了调查。结果表明,优化后的Apriori算法具有解决在线购物数据挖掘任务的能力。特别是,适当增加群集的节点数可以减少运行时间。另外,当使用1.2亿个在线购物记录进行实验时,我们发现编号为50010200和50008168的两种商品的相关度最高,概率达到0.00758%。在总计超过1.2亿的海量记录中,这是一个相对较高的频率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号