首页> 外文期刊>Fundamenta Informaticae >Robust Rough-Fuzzy C-Means Algorithm: Design and Applications in Coding and Non-coding RNA Expression Data Clustering
【24h】

Robust Rough-Fuzzy C-Means Algorithm: Design and Applications in Coding and Non-coding RNA Expression Data Clustering

机译:鲁棒的粗糙模糊C均值算法:在编码和非编码RNA表达数据聚类中的设计和应用

获取原文
获取原文并翻译 | 示例
           

摘要

Cluster analysis is a technique that divides a given data set into a set of clusters in such a way that two objects from the same cluster are as similar as possible and the objects from different clusters are as dissimilar as possible. In this background, different rough-fuzzy clustering algorithms have been shown to be successful for finding overlapping and vaguely defined clusters. However, the crisp lower approximation of a cluster in existing rough-fuzzy clustering algorithms is usually assumed to be spherical in shape, which restricts to find arbitrary shapes of clusters. In this regard, this paper presents a new rough-fuzzy clustering algorithm, termed as robust rough-fuzzy c-means. Each cluster in the proposed clustering algorithm is represented by a set of three parameters, namely, cluster prototype, a possibilistic fuzzy lower approximation, and a probabilistic fuzzy boundary. The possibilistic lower approximation helps in discovering clusters of various shapes. The cluster prototype depends on the weighting average of the possibilistic lower approximation and probabilistic boundary. The proposed algorithm is robust in the sense that it can find overlapping and vaguely defined clusters with arbitrary shapes in noisy environment. An efficient method is presented, based on Pearson's correlation coefficient, to select initial prototypes of different clusters. A method is also introduced based on cluster validity index to identify optimum values of different parameters of the initialization method and the proposed clustering algorithm. The effectiveness of the proposed algorithm, along with a comparison with other clustering algorithms, is demonstrated on synthetic as well as coding and non-coding RNA expression data sets using some cluster validity indices.
机译:聚类分析是一种将给定数据集分为一组聚类的技术,以使来自同一聚类的两个对象尽可能相似,而来自不同聚类的对象则尽可能不同。在这种背景下,不同的粗糙模糊聚类算法已被证明可以成功地找到重叠且模糊定义的聚类。然而,在现有的粗糙-模糊聚类算法中,簇的清晰的较低近似通常被假定为球形,这限制了寻找簇的任意形状。在这方面,本文提出了一种新的粗糙模糊聚类算法,称为鲁棒粗糙模糊c均值。提出的聚类算法中的每个聚类由一组三个参数表示,即聚类原型,可能模糊下近似和概率模糊边界。可能的较低近似有助于发现各种形状的簇。集群原型取决于可能的下近似和概率边界的加权平均值。所提出的算法在可以在嘈杂的环境中找到具有任意形状的重叠和模糊定义的簇的意义上是鲁棒的。提出了一种基于皮尔森相关系数的有效方法,用于选择不同聚类的初始原型。还提出了一种基于聚类有效性指标的方法,用于识别初始化方法和所提出的聚类算法不同参数的最优值。使用一些聚类有效性指标,在合成的以及编码的和非编码的RNA表达数据集上都证明了所提出算法的有效性以及与其他聚类算法的比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号