Massive Online Shopping Data Mining Based on Hadoop

机译：基于Hadoop的海量在线购物数据挖掘

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Huge volume data generated by human interacting with internet contains useful hidden information, e.g. the online shopping data in electronic commerce platform that reflect people's interests and requirements. To exploit the potential value of massive online shopping data and improve the service of online retailers, this paper proposes an optimized distributed Apriori algorithm using Hadoop platform. The data for the experiment are from the Tianchi data laboratory. In the experiments, we examined the performance of the proposed approach against different number of nodes in Hadoop cluster. Besides, the goods with the highest correlation degree among the massive online shopping data are investigated. The results showed that the optimized Apriori algorithm has the capability to tackle the online shopping data mining task. In particular, increasing the number of nodes of the cluster appropriately can decrease the running time. In addition, when the 120 million online shopping records are used for the experiment, we found that the two goods numbered as 50010200 and 50008168 have the highest correlation degree and the probability reaches 0.00758%. In a total of more than 120 million massive records, this is a relatively high frequency.

机译：人类与互联网互动产生的大量数据包含有用的隐藏信息，例如反映人们兴趣和需求的电子商务平台中的在线购物数据。为了挖掘海量在线购物数据的潜在价值并改善在线零售商的服务，本文提出了一种使用Hadoop平台的优化分布式Apriori算法。实验数据来自天池数据实验室。在实验中，我们针对Hadoop集群中不同数量的节点检查了所提出方法的性能。此外，还对海量在线购物数据中相关度最高的商品进行了调查。结果表明，优化后的Apriori算法具有解决在线购物数据挖掘任务的能力。特别是，适当增加群集的节点数可以减少运行时间。另外，当使用1.2亿个在线购物记录进行实验时，我们发现编号为50010200和50008168的两种商品的相关度最高，概率达到0.00758％。在总计超过1.2亿的海量记录中，这是一个相对较高的频率。

著录项

来源
《Chinese Automation Congress》|2018年|3277-3282|共6页
会议地点
作者
Hong Sun; Cunjin Li; Zhong Yin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Task analysis; Clustering algorithms; Itemsets; Distributed databases; Data mining; Computers; Correlation;

机译：任务分析;聚类算法;项目集;分布式数据库;数据挖掘;计算机;关联;

相似文献

外文文献
中文文献
专利

1. Research on Database Massive Data Processing and Mining Method basedon Hadoop Cloud Platform [J] . Zhao Xiaoyong, Yang Chunrong The Open Automation and Control Systems Journal . 2016,第1期

机译：基于Hadoop Cloud平台的数据库海量数据处理与挖掘方法研究
2. The commodity recommendation method for online shopping based on data mining [J] . Ju Chunhua, Wang Jie, Zhou Guanglan Multimedia Tools and Applications . 2019,第21期

机译：基于数据挖掘的网购商品推荐方法
3. Mining theory-based patterns from Big data: Identifying self-regulated learning strategies in Massive Open Online Courses [J] . Maldonado-Mahauad Jorge, Perez-Sanagustin Mar, Kizilcec Rene F., Computers in Human Behavior . 2018,第MARa期

机译：从大数据中挖掘基于理论的模式：在大规模在线公开课程中确定自我调节的学习策略
4. Massive Online Shopping Data Mining based on Hadoop [C] . Hong SUN, Cunjin LI, Zhong YIN Chinese Automation Congress . 2018

机译：基于Hadoop的大规模在线购物数据挖掘
5. Mining massive moving object datasets from RFID flow analysis to traffic mining [D] . Gonzalez, Hector 2008

机译：从RFID流量分析到流量挖掘，挖掘海量移动物体数据集
6. Erratum to: A quantitative assessment of the Hadoop framework for analyzing massively parallel DNA sequencing data [O] . Alexey Siretskiy, Tore Sundqvist, Mikhail Voznesenskiy, 2015

机译：勘误到：对用于分析大规模并行DNA测序数据的Hadoop框架的定量评估
7. Research on Database Massive Data Processing and Mining Method based on Hadoop Cloud Platform [O] . Zhao Xiaoyong, Yang Chunrong 2014

机译：基于Hadoop云平台的数据库大规模数据处理和挖掘方法研究
8. Cluster Analysis-Based Approaches for Geospatiotemporal Data Mining of Massive Data Sets for Identification of Forest Threats. [R] . Mills, R. T., Hoffman, F. M., Kumar, J., 2011

机译：基于聚类分析的海量数据集地理时空数据挖掘方法用于森林威胁识别。

Massive Online Shopping Data Mining Based on Hadoop

摘要

著录项

相似文献

相关主题

期刊订阅