...
首页> 外文期刊>Expert Systems with Application >A weighted multivariate Fuzzy C-Means method in interval-valued scientific production data
【24h】

A weighted multivariate Fuzzy C-Means method in interval-valued scientific production data

机译:区间值科学生产数据中的加权多元模糊C-均值方法

获取原文
获取原文并翻译 | 示例

摘要

Clustering is the process of organizing objects into groups whose members are similar in some way. Most of the clustering methods involve numeric data only. However, this representation may not be adequate to model complex information which may be: histogram, distributions, intervals. To deal with these types of data, Symbolic Data Analysis (SDA) was developed. In multivariate data analysis, it is common some variables be more or less relevant than others and less relevant variables can mask the cluster structure. This work proposes a clustering method based on fuzzy approach that produces weighted multivariate memberships for interval-valued data. These memberships can change at each iteration of the algorithm and they are different from one variable to another and from one cluster to another. Furthermore, there is a different relevance weight associated to each variable that may also be different from one cluster to another. The advantage of this method is that it is robust to ambiguous cluster membership assignment since weights represent how important the different variables are to the clusters. Experiments are performed with synthetic data sets to compare the performance of the proposed method against other methods already established by the clustering literature. Also, an application with interval-valued scientific production data is presented in this work. Clustering quality results have shown that the proposed method offers higher accuracy when variables have different variabilities.
机译:群集是将对象组织成成员在某种程度上相似的组的过程。大多数聚类方法仅涉及数字数据。但是,此表示可能不足以对复杂的信息进行建模,这些信息可能是:直方图,分布,区间。为了处理这些类型的数据,开发了符号数据分析(SDA)。在多变量数据分析中,常见的是某些变量比其他变量或多或少具有相关性,而较少相关的变量可以掩盖聚类结构。这项工作提出了一种基于模糊方法的聚类方法,该方法为区间值数据生成加权多元成员资格。这些成员资格可以在算法的每次迭代中更改,并且它们从一个变量到另一个变量以及从一个群集到另一个群集都是不同的。此外,存在与每个变量相关联的不同的相关性权重,该相关性权重也可能在一个群集与另一个群集之间不同。该方法的优点是,它对歧义的群集成员资格分配具有鲁棒性,因为权重表示不同变量对群集的重要性。用合成数据集进行实验,以将所提出的方法的性能与聚类文献已经建立的其他方法进行比较。此外,这项工作还介绍了具有区间值的科学生产数据的应用程序。聚类质量结果表明,当变量具有不同的可变性时,该方法具有较高的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号