首页> 外文期刊>Expert systems with applications >A new data clustering algorithm based on critical distance methodology
【24h】

A new data clustering algorithm based on critical distance methodology

机译:一种基于关键距离方法的新数据聚类算法

获取原文
获取原文并翻译 | 示例

摘要

A variety of algorithms have recently emerged in the field of cluster analysis. Consequently, based on the distribution nature of the data, an appropriate algorithm can be chosen for the purpose of clustering. It is difficult for a user to decide a priori which algorithm would be the most appropriate for a given dataset. Algorithms based on graphs provide good results for this task. However, these algorithms are vulnerable to outliers with limited information about edges contained in the tree to split a dataset. Thus, in several fields, the need for better clustering algorithms increases and for this reason utilizing robust and dynamic algorithms to improve and simplify the whole process of data clustering has become an urgent need. In this paper, we propose a novel distance-based clustering algorithm called the critical distance clustering algorithm. This algorithm depends on the Euclidean distance between data points and some basic mathematical statistics operations. The algorithm is simple, robust, and flexible; it works with quantitative data that are real-valued, not qualitative, and categorical with different dimensions. In this work, 26 experiments are conducted using different types of real and synthetic datasets taken from different fields. The results prove that the new algorithm outperforms some popular clustering algorithms such as MST-based clustering, K-means, and Dbscan. Moreover, the algorithm can precisely produce more reasonable clusters even when the dataset contains outliers and without specifying any parameters in advance. It also provides a number of indicators to evaluate the established clusters and prove the validity of the clustering. (C) 2019 Published by Elsevier Ltd.
机译:最近在集群分析领域中出现了各种算法。因此,基于数据的分配性质,可以为聚类目的选择合适的算法。用户难以决定先验哪个算法对给定数据集最适合。基于图形的算法为此任务提供了良好的效果。但是,这些算法容易受到与树中包含的边缘有限的异常值,以拆分数据集。因此,在几个字段中,需要更好的聚类算法增加,并且由于这种原因利用鲁棒和动态算法来提高和简化数据聚类的整个过程已经成为迫切需要。在本文中,我们提出了一种名为临界距离聚类算法的新型基于距离的聚类算法。该算法取决于数据点与一些基本数学统计操作之间的欧几里德距离。该算法简单,坚固,灵活;它适用于定量数据,该数据是实值,而不是定性和分类的不同尺寸。在这项工作中,使用来自不同领域的不同类型的实际和合成数据集进行26个实验。结果证明,新算法优于一些流行的聚类算法,例如基于MST的聚类,K均值和DBSCAN。此外,即使数据集包含异常值并且不提前指定任何参数,算法也可以精确地产生更合理的群集。它还提供了许多指标来评估已建立的集群并证明群集的有效性。 (c)2019年由elestvier有限公司发布

著录项

  • 来源
    《Expert systems with applications》 |2019年第9期|296-310|共15页
  • 作者单位

    Ankara Yildirim Beyazit Univ Dept Comp Engn Fac Engn & Nat Sci TR-06220 Ankara Turkey;

    Ankara Yildirim Beyazit Univ Dept Comp Engn Fac Engn & Nat Sci TR-06220 Ankara Turkey;

    Ankara Yildirim Beyazit Univ Dept Comp Engn Fac Engn & Nat Sci TR-06220 Ankara Turkey|Amer Univ Middle East Coll Engn & Technol Eqaila 15453 Kuwait;

    Huddersfield Univ Sch Comp & Engn Huddersfield HD1 3DH W Yorkshire England;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Algorithm; Cluster analysis; Euclidean distance; MST;

    机译:算法;聚类分析;欧几里德距离;MST;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号