首页> 美国卫生研究院文献>other >An Empirical Analysis of Rough Set Categorical Clustering Techniques
【2h】

An Empirical Analysis of Rough Set Categorical Clustering Techniques

机译:粗糙集分类聚类技术的实证分析

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) has outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR) and Min-Min Roughness(MMR). This paper presents the limitations and issues of MDA and MSA techniques on special type of data sets where both techniques fails to select or faces difficulty in selecting their best clustering attribute. Therefore, this analysis motivates the need to come up with better and more generalize rough set theory approach that can cope the issues with MDA and MSA. Hence, an alternative technique named Maximum Indiscernible Attribute (MIA) for clustering categorical data using rough set indiscernible relations is proposed. The novelty of the proposed approach is that, unlike other rough set theory techniques, it uses the domain knowledge of the data set. It is based on the concept of indiscernibility relation combined with a number of clusters. To show the significance of proposed approach, the effect of number of clusters on rough accuracy, purity and entropy are described in the form of propositions. Moreover, ten different data sets from previously utilized research cases and UCI repository are used for experiments. The results produced in tabular and graphical forms shows that the proposed MIA technique provides better performance in selecting the clustering attribute in terms of purity, entropy, iterations, time, accuracy and rough accuracy.
机译:将一组对象聚类为同类组是数据挖掘中的基本操作。最近,人们对分类数据聚类给予了很多关注,其中数据对象由非数值属性组成。对于分类数据聚类,基于粗糙集的方法(例如最大依赖属性(MDA)和最大重要性属性(MSA))的性能优于其前身的方法,例如双聚类(BC),总粗糙度(TR)和最小-最小粗糙度(MMR) 。本文介绍了MDA和MSA技术在特殊类型的数据集上的局限性和问题,其中两种技术都无法选择或难以选择其最佳聚类属性。因此,这种分析激发了需要提出一种更好,更通用的粗糙集理论方法来解决MDA和MSA的问题。因此,提出了一种替代技术,即使用粗糙集不可区分的关系对分类数据进行聚类的名为“最大不可区分属性”(MIA)。所提出的方法的新颖性在于,与其他粗糙集理论技术不同,它使用了数据集的领域知识。它基于不可分辨关系的概念并结合了许多群集。为了说明所提出方法的重要性,以命题的形式描述了簇数对粗略准确性,纯度和熵的影响。此外,来自先前使用的研究案例和UCI资料库的十个不同数据集用于实验。以表格和图形形式产生的结果表明,所提出的MIA技术在选择聚类属性(纯度,熵,迭代,时间,准确性和粗略准确性)方面提供了更好的性能。

著录项

  • 期刊名称 other
  • 作者单位
  • 年(卷),期 -1(12),1
  • 年度 -1
  • 页码 e0164803
  • 总页数 22
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 关键词

  • 入库时间 2022-08-21 11:10:46

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号