...
首页> 外文期刊>Journal of information & knowledge management >PDC-Transitive: An Enhanced Heuristic for Document Clustering Based on Relational Analysis Approach and Iterative MapReduce
【24h】

PDC-Transitive: An Enhanced Heuristic for Document Clustering Based on Relational Analysis Approach and Iterative MapReduce

机译:PDC-传递:基于关系分析方法和迭代MapReduce的文档聚类增强启发式

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Recently, MapReduce-based implementations of clustering algorithms have been developed to cope with the Big Data phenomenon, and they show promising results particularly for the document clustering problem. In this paper, we extend an efficient data partitioning method based on the relational analysis (RA) approach and applied to the document clustering problem, called PDC-Transitive. The introduced heuristic is parallelised using the MapReduce model iteratively and designed with a single reducer which represents a bottleneck when processing large data, we improved the design of the PDC-Transitive method to avoid the data dependencies and reduce the computation cost. Experiment results on benchmark datasets demonstrate that the enhanced heuristic yields better quality results and requires less computing time compared to the original method.
机译:最近,已经开发了基于MapReduce的聚类算法实现,以应对大数据现象,并且它们表明尤其是对文档聚类问题的有希望的结果。 在本文中,我们基于关系分析(RA)方法扩展了一种高效的数据分区方法,并应用于文档聚类问题,称为PDC传递。 介绍的启发式通过Mapreduce模型并行化,并使用单个减速器设计,该减速器表示在处理大数据时表示瓶颈,我们改进了PDC传递方法的设计,以避免数据依赖性并降低计算成本。 基准数据集上的实验结果表明,增强的启发式产生了更好的质量结果,并与原始方法相比需要更少的计算时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号