首页> 外文期刊>Journal of computational and theoretical nanoscience >Parallel K-Means Implementation for Data Clustering Using Hadoop Map-Reduce
【24h】

Parallel K-Means Implementation for Data Clustering Using Hadoop Map-Reduce

机译:使用Hadoop地图 - 减少的数据群集并行K-Meanse实现

获取原文
获取原文并翻译 | 示例
           

摘要

The electronic information from online newspapers, journals, conference proceedings website pages and emails are growing rapidly which are generating huge amount of data. Controlling, indexing and searching of these huge electronic data is not feasible especially for human and alsofor search engines. Thus, automatic document organization is an important issue for this huge information. Using document clustering methods insights the data distribution or pre-process data for other applications. In this paper, a parallel clustering algorithm based on K-mean clusteringis proposed which is to iterate and optimize documents upload and access. Specifically, the proposed algorithm is implemented on Apache Hadoop architecture with the huge document access data set and the algorithm is evaluated on different conditions with different possible input documents.This paper is used to present the information of frequent access of each access and to suggest the pattern of document representation in the cloud storage.
机译:来自在线报纸,期刊,会议诉讼网站页面和电子邮件的电子信息正在快速增长,这是产生大量数据。控制,索引和搜索这些庞大的电子数据是不可行的,特别是人类和Alsofor搜索引擎。因此,自动文档组织是这种巨大信息的重要问题。使用文档群集方法对其他应用程序的数据分发或预处理数据深入了解。在本文中,基于K-MEAL Clusteringis的并行聚类算法,该算法是迭代和优化上传和访问的文档。具体地,该算法在Apache Hadoop架构上实现了具有巨大的文档访问数据集,并且在具有不同可能的输入文档的不同条件下评估算法。本文用于呈现每个访问的频繁访问的信息,并建议文档表示的模式在云存储中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号