首页> 外文OA文献 >A comparison of several cluster algorithms on artificial binary data Part 1. Scenarios from travel market segmentation Part 2: Working Paper 19.
【2h】

A comparison of several cluster algorithms on artificial binary data Part 1. Scenarios from travel market segmentation Part 2: Working Paper 19.

机译:几种基于人工二进制数据的聚类算法的比较第1部分。来自旅游市场细分的场景第2部分:工作文件19。

摘要

Social scientists confronted with the problem of segmenting individuals into plausible subgroups usually encounter two main problems: First: there is very little indication about the correct choice of the number of clusters to search for. Second: different cluster algorithms and even multiple replications of the same algorithm result in different solutions due to random initializations and stochastic learning methods. In the worst case numerous solutions are found which all seem plausible as far as interpretation is concerned. The consequence is, that in the end clusters are postulated that are in fact "chosen" by the researcher, as he or she makes decisions on the number of clusters and the solution chosen as the "final" one. In this paper we concentrate on the power and stability of several popular clustering algorithms under the condition that the correct number of clusters is known. Artificial data sets modeled to mimic typical situations from tourism marketing are constructed. The structure of these data sets is described in several scenarios, and artificial binary data are generated accordingly. These data, ranging from very simple to more complex, real-data-like structures, enable us to systematically analyze the "behavior" of the cluster methods. Section 3 gives an overview of all cluster methods under investigation. Section 4 describes our experimental results, comparing first all scenarios and then all cluster methods. To accomplish this task, several evaluation criteria for cluster methods are proposed. Finally: Sections 5 and 6 draw some conclusions and give an outlook on future research. (author's abstract)
机译:社会科学家面临着将个人划分为合理的亚组的问题,通常会遇到两个主要问题:第一:很少有关于正确选择要搜索的集群数的迹象。第二:由于随机初始化和随机学习方法,不同的群集算法,甚至同一算法的多次复制都将产生不同的解决方案。在最坏的情况下,发现了许多解决方案,就解释而言,所有解​​决方案似乎都是合理的。结果是,最终,假定研究人员在决定簇的数量和选择为“最终”簇的解决方案时,实际上是由研究者“选择”的。在本文中,我们集中在已知正确簇数的情况下,几种流行的聚类算法的功能和稳定性。人工建模的数据集可以模仿旅游营销中的典型情况。这些数据集的结构在几种情况下进行了描述,并相应地生成了人工二进制数据。这些数据的范围从非常简单到更加复杂,类似于真实数据的结构,使我们能够系统地分析聚类方法的“行为”。第三部分概述了所有正在研究的聚类方法。第4节介绍了我们的实验结果,首先比较了所有方案,然后比较了所有聚类方法。为了完成这一任务,提出了几种聚类方法的评价标准。最后:第5和第6节得出一些结论,并对未来的研究进行展望。 (作者的摘要)

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号