首页> 外文学位 >Parallel hybrid clustering using genetic programming and multi-objective fitness with density (PYRAMID).
【24h】

Parallel hybrid clustering using genetic programming and multi-objective fitness with density (PYRAMID).

机译:使用遗传规划和密度多目标适应性(PYRAMID)的并行混合聚类。

获取原文
获取原文并翻译 | 示例

摘要

Clustering is the art of locating patterns in large data sets. It is an active research area that provides value to scientific as well as business applications. There are some challenges that face practical clustering including: identifying clusters of arbitrary shapes, sensitivity to the order of input, dynamic determination of the number of clusters, outlier handling, high dependency on user-defined parameters, processing speed of massive data sets, and the potential to fall into sub-optimal solutions.; Many studies that were conducted in the realm of clustering have addressed some of these challenges. This study proposes a new approach, called parallel hybrid clustering using genetic programming and multi-objective fitness with density (PYRAMID), that tackles several of these challenges from a different perspective.; PYRAMID employs genetic programming to represent arbitrary cluster shapes and circumvent falling in local optima. It accommodates large data sets and avoids dependency on the order of input by quantizing the data space, i.e., the space on which the data set resides, thus abstracting it into hyper-rectangular cells and creating genetic programming individuals as concatenations of these cells. Thus the cells become the subject of clustering, rather than the data points themselves. PYRAMID also utilizes a density-based multi-objective fitness function to handle outliers. It gathers statistics in a pre-processing step and uses them so not to rely on user-defined parameters. Finally, PYRAMID employs data parallelism in a master-slave model in an attempt to cure the inherent slow performance of evolutionary algorithms and provide speedup. A master processor distributes the clustering data evenly onto multiple slave processors. The slave processors conduct the clustering on their local data sets and report their clustering results back to the master, which consolidates them by merging the partial results into a final clustering solution. This last step also involves determining the number of clusters dynamically and labeling them accordingly.; Experiments have demonstrated that, using these features, PYRAMID offers an advantage over some of the existing approaches by tackling the clustering challenges from a different angle.
机译:聚类是在大型数据集中定位模式的技术。这是一个活跃的研究领域,为科学和商业应用提供价值。实际的聚类面临一些挑战,包括:识别任意形状的聚类,对输入顺序的敏感性,聚类数量的动态确定,离群值处理,对用户定义参数的高度依赖,海量数据集的处理速度以及陷入次优解决方案的可能性。在集群领域进行的许多研究已经解决了其中一些挑战。这项研究提出了一种新方法,称为并行混合聚类,它使用遗传规划和密度多目标适应性(PYRAMID),从不同的角度解决了其中一些挑战。 PYRAMID使用遗传编程来表示任意的簇形状,并避免局部最优解。它可容纳大型数据集,并通过对数据空间(即数据集所在的空间)进行量化来避免依赖于输入顺序,从而将其抽象为超矩形细胞并创建遗传编程个体作为这些细胞的串联。因此,单元成为群集的主题,而不是数据点本身。 PYRAMID还利用基于密度的多目标适应度函数来处理异常值。它在预处理步骤中收集统计信息并使用它们,以便不依赖于用户定义的参数。最后,PYRAMID在主从模型中采用数据并行性,以试图解决演化算法固有的性能下降问题并提供加速。主处理器将群集数据平均分布到多个从处理器上。从属处理器在其本地数据集上进行聚类,并将其聚类结果报告给主服务器,后者通过将部分结果合并为最终的聚类解决方案来合并它们。最后一步还涉及动态确定集群的数量并相应地对其进行标记。实验表明,使用这些功能,PYRAMID通过从不同角度解决聚类挑战,提供了优于某些现有方法的优势。

著录项

  • 作者

    Tout, Samir R.;

  • 作者单位

    Nova Southeastern University.;

  • 授予单位 Nova Southeastern University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 293 p.
  • 总页数 293
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号