...
首页> 外文期刊>International Journal of Pattern Recognition and Artificial Intelligence >STiMR k-Means: An Efficient Clustering Method for Big Data
【24h】

STiMR k-Means: An Efficient Clustering Method for Big Data

机译:STiMR k-Means:大数据的有效聚类方法

获取原文
获取原文并翻译 | 示例
           

摘要

Big Data clustering has become an important challenge in data analysis since several applications require scalable clustering methods to organize such data into groups of similar objects. Given the computational cost of most of the existing clustering methods, we propose in this paper a new clustering method, referred to as STiMR k-means, able to provide good tradeoff between scalability and clustering quality. The proposed method is based on the combination of three acceleration techniques: sampling, triangle inequality and MapReduce. Sampling is used to reduce the number of data points when building cluster prototypes, triangle inequality is used to reduce the number of comparisons when looking for nearest clusters and MapReduce is used to configure a parallel framework for running the proposed method. Experiments performed on simulated and real datasets have shown the effectiveness of the proposed method, with the existing ones, in terms of running time, scalability and internal validity measures.
机译:大数据集群已成为数据分析中的一项重要挑战,因为一些应用程序需要可伸缩的集群方法来将此类数据组织为相似对象的组。考虑到大多数现有聚类方法的计算成本,我们在本文中提出了一种新的聚类方法,称为STiMR k-means,它能够在可伸缩性和聚类质量之间提供良好的折衷。所提出的方法基于三种加速技术的组合:采样,三角不等式和MapReduce。在构建集群原型时,采样用于减少数据点的数量,在寻找最近的集群时,三角形不等式用于减少比较的数量,而MapReduce用于配置并行框架以运行所提出的方法。在模拟数据集和真实数据集上进行的实验表明,与现有方法相比,该方法的有效性,运行时间,可伸缩性和内部有效性度量方面。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号