首页> 外文期刊>Expert Systems with Application >An improved algorithm for partial clustering
【24h】

An improved algorithm for partial clustering

机译:一种改进的局部聚类算法

获取原文
获取原文并翻译 | 示例

摘要

Expert and intelligent systems use a variety of machine learning techniques to obtain and understand the information inherent in the data. Clustering is one of these techniques, which has become important and popular since it allows classifying an unlabeled dataset into clusters of similar objects. There are many clustering algorithms that have been proposed in the literature. From these algorithms, the Cross-Clustering algorithm is one of the most recent clustering algorithms for partial clustering (clustering where not necessarily all the objects are grouped into clusters), which has provided good results allowing estimating a suitable set of clusters, as well as eliminating outliers. However, this algorithm tends to eliminate too many objects as outliers, which leads to discard a lot of non-outlier objects. Additionally, the Cross-Clustering algorithms spends a lot of time evaluating several combinations of clusterings, trying to determine a suitable number of clusters. To overcome these problems, in this paper, an improved version of the Cross-Clustering algorithm (ICC) is proposed. ICC changes the clustering algorithm used for detecting outliers, as well as it modifies the way outliers are detected. Moreover, a stop criterion allowing to make a fast decision on the estimation of a suitable number of cluster, is also introduced. The performance of the improved Cross-Clustering algorithm is compared with the original algorithm on artificial and real datasets. Our results show that ICC improves the original algorithm and other state of the art clustering algorithms; in both, runtime and clustering quality. (C) 2018 Elsevier Ltd. All rights reserved.
机译:专家和智能系统使用各种机器学习技术来获取和理解数据中固有的信息。聚类是这些技术之一,由于它允许将未标记的数据集分类为相似对象的聚类,因此已变得重要和流行。文献中已经提出了许多聚类算法。从这些算法中,交叉聚类算法是用于部分聚类(不必将所有对象都分组到聚类中的聚类)的最新聚类算法之一,它提供了良好的结果,可以估算一组合适的聚类,以及消除异常值。但是,该算法倾向于消除过多的异常值对象,从而导致丢弃大量非异常值对象。此外,跨集群算法会花费大量时间评估集群的几种组合,从而尝试确定合适数量的集群。为了克服这些问题,本文提出了一种改进的交叉聚类算法(ICC)。 ICC更改了用于检测异常值的聚类算法,并修改了检测异常值的方式。此外,还介绍了一种停止准则,该准则允许对合适数目的群集的估计做出快速决策。在人工和真实数据集上,将改进的交叉聚类算法的性能与原始算法进行了比较。我们的结果表明,ICC改进了原始算法和其他先进的聚类算法。在运行时间和集群质量上。 (C)2018 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号