Massive Online Shopping Data Mining based on Hadoop

机译：基于Hadoop的大规模在线购物数据挖掘

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Huge volume data generated by human interacting with internet contains useful hidden information, e.g. the online shopping data in electronic commerce platform that reflect people's interests and requirements. To exploit the potential value of massive online shopping data and improve the service of online retailers, this paper proposes an optimized distributed Apriori algorithm using Hadoop platform. The data for the experiment are from the Tianchi data laboratory. In the experiments, we examined the performance of the proposed approach against different number of nodes in Hadoop cluster. Besides, the goods with the highest correlation degree among the massive online shopping data are investigated. The results showed that the optimized Apriori algorithm has the capability to tackle the online shopping data mining task. In particular, increasing the number of nodes of the cluster appropriately can decrease the running time. In addition, when the 120 million online shopping records are used for the experiment, we found that the two goods numbered as 50010200 and 50008168 have the highest correlation degree and the probability reaches 0.00758%. In a total of more than 120 million massive records, this is a relatively high frequency.

机译：人类与互联网互动生成的巨大卷数据包含有用的隐藏信息，例如，电子商务平台的在线购物数据反映了人们的利益和要求。为了利用大规模在线购物数据的潜在价值，提高在线零售商的服务，本文提出了一种使用Hadoop平台的优化分布式APRiori算法。实验的数据来自天池数据实验室。在实验中，我们检查了针对Hadoop集群中不同数量的节点的提出方法的表现。此外，还研究了大规模在线购物数据中具有最高相关程度的货物。结果表明，优化的APRiori算法具有解决在线购物数据挖掘任务的能力。特别地，适当地增加集群的节点数量可以减小运行时间。此外，当新的120万辆在线购物记录用于实验时，我们发现编号为50010200和50008168的两个货物具有最高的相关程度，概率达到0.00758％。总共超过1.2亿的巨大记录，这是一个相对高的频率。

著录项

来源
《Chinese Automation Congress》|2018年|2869-3586p|共6页
会议地点
作者
Hong SUN; Cunjin LI; Zhong YIN;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP2-53;
关键词
Online shopping data; Distributed apriori algorithm; Hadoop; Correlation degree;

机译：在线购物数据;分布式APRIORI算法;HADOOP;相关程度;

相似文献

外文文献
中文文献
专利

1. Research on Database Massive Data Processing and Mining Method basedon Hadoop Cloud Platform [J] . Zhao Xiaoyong, Yang Chunrong The Open Automation and Control Systems Journal . 2016,第1期

机译：基于Hadoop Cloud平台的数据库海量数据处理与挖掘方法研究
2. The commodity recommendation method for online shopping based on data mining [J] . Ju Chunhua, Wang Jie, Zhou Guanglan Multimedia Tools and Applications . 2019,第21期

机译：基于数据挖掘的网购商品推荐方法
3. Mining theory-based patterns from Big data: Identifying self-regulated learning strategies in Massive Open Online Courses [J] . Maldonado-Mahauad Jorge, Perez-Sanagustin Mar, Kizilcec Rene F., Computers in Human Behavior . 2018,第MARa期

机译：从大数据中挖掘基于理论的模式：在大规模在线公开课程中确定自我调节的学习策略
4. Massive Online Shopping Data Mining Based on Hadoop [C] . Hong Sun, Cunjin Li, Zhong Yin Chinese Automation Congress . 2018

机译：基于Hadoop的海量在线购物数据挖掘
5. Mining massive moving object datasets from RFID flow analysis to traffic mining [D] . Gonzalez, Hector 2008

机译：从RFID流量分析到流量挖掘，挖掘海量移动物体数据集
6. Erratum to: A quantitative assessment of the Hadoop framework for analyzing massively parallel DNA sequencing data [O] . Alexey Siretskiy, Tore Sundqvist, Mikhail Voznesenskiy, 2015

机译：勘误到：对用于分析大规模并行DNA测序数据的Hadoop框架的定量评估
7. Research on Database Massive Data Processing and Mining Method based on Hadoop Cloud Platform [O] . Zhao Xiaoyong, Yang Chunrong 2014

机译：基于Hadoop云平台的数据库大规模数据处理和挖掘方法研究
8. Cluster Analysis-Based Approaches for Geospatiotemporal Data Mining of Massive Data Sets for Identification of Forest Threats. [R] . Mills, R. T., Hoffman, F. M., Kumar, J., 2011

机译：基于聚类分析的海量数据集地理时空数据挖掘方法用于森林威胁识别。

Massive Online Shopping Data Mining based on Hadoop

摘要

著录项

相似文献

相关主题

期刊订阅