首页> 外文期刊>Bioinformatics >A simple and fast method to determine the parameters for fuzzy c-means cluster analysis.
【24h】

A simple and fast method to determine the parameters for fuzzy c-means cluster analysis.

机译:一种简单快速的确定模糊c均值聚类分析参数的方法。

获取原文
获取原文并翻译 | 示例
           

摘要

MOTIVATION: Fuzzy c-means clustering is widely used to identify cluster structures in high-dimensional datasets, such as those obtained in DNA microarray and quantitative proteomics experiments. One of its main limitations is the lack of a computationally fast method to set optimal values of algorithm parameters. Wrong parameter values may either lead to the inclusion of purely random fluctuations in the results or ignore potentially important data. The optimal solution has parameter values for which the clustering does not yield any results for a purely random dataset but which detects cluster formation with maximum resolution on the edge of randomness. RESULTS: Estimation of the optimal parameter values is achieved by evaluation of the results of the clustering procedure applied to randomized datasets. In this case, the optimal value of the fuzzifier follows common rules that depend only on the main properties of the dataset. Taking the dimension of the set and the number of objects as input values instead of evaluating the entire dataset allows us to propose a functional relationship determining the fuzzifier directly. This result speaks strongly against using a predefined fuzzifier as typically done in many previous studies. Validation indices are generally used for the estimation of the optimal number of clusters. A comparison shows that the minimum distance between the centroids provides results that are at least equivalent or better than those obtained by other computationally more expensive indices.
机译:动机:模糊c均值聚类被广泛用于识别高维数据集中的簇结构,例如在DNA微阵列和定量蛋白质组学实验中获得的那些。它的主要限制之一是缺乏一种计算速度快的方法来设置算法参数的最佳值。错误的参数值可能导致结果中包含纯随机波动,或者忽略潜在的重要数据。最佳解决方案具有参数值,对于该参数值,聚类对于纯随机数据集不会产生任何结果,但是会在随机性边缘以最大分辨率检测聚类的形成。结果:最佳参数值的估计是通过对应用于随机数据集的聚类过程的结果进行评估来实现的。在这种情况下,模糊器的最佳值遵循仅取决于数据集主要属性的通用规则。以集合的维度和对象的数量为输入值,而不是评估整个数据集,可以让我们提出直接确定模糊器的功能关系。该结果强烈反对使用预定义的模糊器,这在许多先前的研究中通常都是如此。验证指标通常用于估计最佳群集数。比较表明,质心之间的最小距离提供的结果至少等于或优于其他计算上更昂贵的索引所获得的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号