首页> 外文OA文献 >Evaluating Heuristics and Crowding on Center Selection in K-Means Genetic Algorithms
【2h】

Evaluating Heuristics and Crowding on Center Selection in K-Means Genetic Algorithms

机译:K-Means遗传算法中的启发式评估和拥挤中心选择

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Data clustering involves partitioning data points into clusters where data points within the same cluster have high similarity, but are dissimilar to the data points in other clusters. The k-means algorithm is among the most extensively used clustering techniques. Genetic algorithms (GA) have been successfully used to evolve successive generations of cluster centers. The primary goal of this research was to develop improved GA-based methods for center selection in k-means by using heuristic methods to improve the overall fitness of the initial population of chromosomes along with crowding techniques to avoid premature convergence. Prior to this research, no rigorous systematic examination of the use of heuristics and crowding methods in this domain had been performed.The evaluation included computational experiments involving repeated runs of the genetic algorithm in which values that affect heuristics or crowding were systematically varied and the results analyzed. Genetic algorithm performance under the various configurations was analyzed based upon (1) the fitness of the partitions produced, and by (2) the overall time it took the GA to converge to good solutions. Two heuristic methods for initial center seeding were tested: Density and Separation. Two crowding techniques were evaluated on their ability to prevent premature convergence: Deterministic and Parent Favored Hybrid local tournament selection.Based on the experiment results, the Density method provides no significant advantage over random seeding either in discovering quality partitions or in more quickly evolving better partitions. The Separation method appears to result in an increased probability of the genetic algorithm finding slightly better partitions in slightly fewer generations, and to more quickly converge to quality partitions. Both local tournament selection techniques consistently allowed the genetic algorithm to find better quality partitions than roulette-wheel sampling. Deterministic selection consistently found better quality partitions in fewer generations than Parent Favored Hybrid. The combination of Separation center seeding and Deterministic selection performed better than any other combination, achieving the lowest mean best SSE value more than twice as often as any other combination. On all 28 benchmark problem instances, the combination identified solutions that were at least as good as any identified by extant methods.
机译:数据群集涉及将数据点划分为群集,其中同一群集内的数据点具有高度相似性,但与其他群集中的数据点不相似。 k均值算法是使用最广泛的聚类技术之一。遗传算法(GA)已成功用于演化集群中心的连续世代。这项研究的主要目的是开发改进的基于遗传算法的k均值中心选择方法,方法是使用启发式方法提高初始染色体总体的总体适应度,并采用拥挤技术来避免过早收敛。在进行这项研究之前,尚未对该领域中的启发式方法和拥挤方法进行严格的系统检查。评估包括涉及遗传算法重复运行的计算实验,其中影响启发式或拥挤的值被系统地改变,结果分析。基于(1)生成的分区的适合度,以及(2)GA收敛到良好解所需的总时间,分析了遗传算法在各种配置下的性能。测试了用于初始中心播种的两种启发式方法:密度和分离。评估了两种拥挤技术防止过早收敛的能力:确定性和父母偏好的混合本地锦标赛选择。基于实验结果,密度方法在发现质量分区或更快地发展更好的分区方面没有提供比随机播种明显的优势。 。分离方法似乎导致遗传算法在较少的世代中找到更好的分区的可能性增加,并且更快地收敛到高质量的分区。两种本地锦标赛选择技术始终允许遗传算法找到比轮盘抽样更好的质量分区。确定性选择始终能够在比“父代喜欢的混合对象”更少的世代中找到更好的质量分区。 “分离中心播种”和“确定性选择”的组合比其他任何组合的性能要好,达到最低平均最佳SSE值的频率是任何其他组合的两倍以上。在所有28个基准问题实例上,组合确定的解决方案至少与现有方法确定的解决方案一样好。

著录项

  • 作者

    McGarvey William;

  • 作者单位
  • 年度 2014
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号