首页> 外文期刊>International Journal of Engineering Research and Applications >An Efficient Method of Partitioning High Volumes of Multidimensional Data for Parallel Clustering Algorithms
【24h】

An Efficient Method of Partitioning High Volumes of Multidimensional Data for Parallel Clustering Algorithms

机译:并行聚类算法的高效分割大量多维数据的有效方法

获取原文
       

摘要

An optimal data partitioning in parallel/distributed implementation of clustering algorithms is a necessary computation as it ensures independent task completion, fair distribution, less number of affected points and better & faster merging. Though partitioning using Kd-Tree is being conventionally used in academia, it suffers from performance drenches and bias (non equal distribution) as dimensionality of data increases and hence is not suitable for practical use in industry where dimensionality can be of order of 100's to 1000's. To address these issues we propose two new partitioning techniques using existing mathematical models & study their feasibility, performance (bias and partitioning speed) & possible variants in choosing initial seeds.
机译:在集群算法的并行/分布式实现中,最佳数据分区是必需的计算,因为它可以确保独立完成任务,公平分配,受影响的点数更少以及更好更快地合并。尽管在学术界通常使用Kd-Tree进行分区,但是随着数据维数的增加,它会遭受性能下降和偏差(非均等分布)的困扰,因此不适合在维数约为100到1000的行业中实际使用。 。为了解决这些问题,我们提出了两种使用现有数学模型的新分区技术,并研究了它们的可行性,性能(偏差和分区速度)以及在选择初始种子时的可能变体。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号