...
首页> 外文期刊>Expert systems with applications >Uncertainty mode selection in categorical clustering using the rough set theory
【24h】

Uncertainty mode selection in categorical clustering using the rough set theory

机译:使用粗糙集理论的分类聚类不确定性模式选择

获取原文
获取原文并翻译 | 示例
           

摘要

Clustering is an unsupervised Machine Learning technique widely used to arrange a set of observations into distinct groups called clusters. The problem of categorical clustering has attracted much attention since many real world applications tend to produce such data types. The k-mode was among the first algorithms developed in this context. This algorithms uses the notion of modes to represent the centroids within the clusters. However, its major drawback lies in the random selection of the modes in each iteration during the clustering process. In this paper, we tackled this random selection issue and proposed a new method based on identifying the most adequate modes among a list of candidate ones. The proposed algorithm called Density Rough k-modes (DRk-M) is based on computing the density of each candidate mode to characterize the distribution of the observations around it. Then, we use the Rough Set Theory to deal with the uncertainty involved in this process. The DRk-M was experimented using real world datasets extracted from the UCI (University of California Irvine) Machine Learning Repository, the Global Terrorism Database (GTD) and a set of scrapped Tweets. The DRk-M was compared to many state of the art methods including the k-modes (1998), the Ng's method (2007), Cao's method (2012) and Bai's technique (2014) and it has shown great efficiency. (C) 2020 Elsevier Ltd. All rights reserved.
机译:聚类是一种无监督的机器学习技术,广泛用于将一组观察分布成称为簇的不同组。分类聚类的问题引起了很多关注,因为许多现实世界应用倾向于产生这样的数据类型。 K-Mode是在此上下文中开发的第一算法之一。该算法使用模式的概念来表示集群内的质心。但是,它的主要缺点位于聚类过程中每次迭代中的模式的随机选择。在本文中,我们解决了这个随机选择问题,并提出了一种基于识别候选人列表中最适当模式的新方法。所谓的算法称为密度粗糙k模式(DRK-M)是基于计算每个候选模式的密度,以表征周围的观察的分布。然后,我们使用粗糙集理论来处理此过程中涉及的不确定性。 DRK-M使用从UCI(加利福尼亚大学IRVINE)机器学习存储库,全球恐怖主义数据库(GTD)和一组报废推文中提取的真实世界数据集进行了实验。将DRK-M与许多现有技术进行比较,包括K-MODES(1998),NG方法(2007),CAO的方法(2012)和BAI的技术(2014),它表现出了很大的效率。 (c)2020 elestvier有限公司保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号