首页> 中文期刊> 《机床与液压》 >云计算平台中分布式Hadoop数据挖掘关键技术研究

云计算平台中分布式Hadoop数据挖掘关键技术研究

         

摘要

Big data feature mining in cloud computing environment is the basis for big data statistics and analysis. In order to improve the accuracy and speed of clustering,a data mining scheme based on distributed Hadoop platform and entropy weighted feature selection was designed in this paper. This scheme firstly uses the no-loop directed graph to analyze the problem of Map Reduce job stream scheduling under Hadoop platform,and then uses the parallel Map Reduce execution to complete the distributed computing. Finally,massive data mining is implemented by using the entropy weighted clustering algorithm. Simulation results show that the proposed data mining scheme has good clustering effect and operation efficiency.%云计算环境下的大数据特征挖掘是大数据统计及分析的基础.为了提高聚类的准确度和速度,设计了一种基于分布式Hadoop平台和熵加权特征选择的数据挖掘方案.该方案首先采用无回路有向图对Hadoop平台下的Map Reduce作业流调度问题进行了分析.然后采用并行Map Reduce执行过程完成分布式计算.最后,采用熵加权聚类算法实现海量数据挖掘.仿真结果显示,提出的数据挖掘方案具有较好聚类效果和运行效率.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号