首页> 外文期刊>Artificial intelligence in medicine >Fuzzy ensemble clustering based on random projections for DNA microarray data analysis
【24h】

Fuzzy ensemble clustering based on random projections for DNA microarray data analysis

机译:基于随机投影的模糊集成聚类用于DNA芯片数据分析

获取原文
获取原文并翻译 | 示例
           

摘要

Objective: Two major problems related the unsupervised analysis of gene expression data are represented by the accuracy and reliability of the discovered clusters, and by the biological fact that the boundaries between classes of patients or classes of functionally related genes are sometimes not clearly defined. The main goal of this work consists in the exploration of new strategies and in the development of new clustering methods to improve the accuracy and robustness of clustering results, taking into account the uncertainty underlying the assignment of examples to clusters in the context of gene expression data analysis.rnMethodology: We propose a fuzzy ensemble clustering approach both to improve the accuracy of clustering results and to take into account the inherent fuzziness of biological and bio-medical gene expression data. We applied random projections that obey the Johnson-Lindenstrauss lemma to obtain several instances of lower dimensional gene expression data from the original high-dimensional ones, approximately preserving the information and the metric structure of the original data. Then we adopt a double fuzzy approach to obtain a consensus ensemble clustering, by first applying a fuzzy k-means algorithm to the different instances of the projected low-dimensional data and then by using a fuzzy t-norm to combine the multiple clusterings. Several variants of the fuzzy ensemble clustering algorithms are proposed, according to different techniques to combine the base clusterings and to obtain the final consensus clustering.rnResults and conclusion: We applied our proposed fuzzy ensemble methods to the gene expression analysis of leukemia, lymphoma, adenocarcinoma and melanoma patients, and we compared the results with other state of the art ensemble methods. Results show that in some cases, taking into account the natural fuzziness of the data, we can improve the discovery of classes of patients defined at bio-molecular level. The reduction of the dimension of the data, achieved through random projectionsrntechniques, is well-suited to the characteristics of high-dimensional gene expression data, thus resulting in improved performance with respect to single fuzzy k-means and with respect to ensemble methods based on resampling techniques. Moreover, we show that the analysis of the accuracy and diversity of the base fuzzy clusterings can be useful to explain the advantages and the limitations of the proposed fuzzy ensemble approach.
机译:目的:与基因表达数据的无监督分析有关的两个主要问题由发现的簇的准确性和可靠性以及有时无法明确定义患者类别或功能相关基因类别之间界限的生物学事实所代表。这项工作的主要目标在于探索新的策略和开发新的聚类方法,以提高聚类结果的准确性和鲁棒性,同时考虑到在基因表达数据中将实例分配给聚类的潜在不确定性方法:我们提出了一种模糊集成聚类方法,既可以提高聚类结果的准确性,又可以考虑生物和生物医学基因表达数据固有的模糊性。我们应用了遵循Johnson-Lindenstrauss引理的随机投影,从原始的高维样本中获得了几个低维基因表达数据的实例,大致保留了原始数据的信息和度量结构。然后,我们首先采用模糊k均值算法对投影的低维数据的不同实例应用模糊k均值算法,然后使用模糊t范数组合多个聚类,从而采用双重模糊方法获得共识集合聚类。根据不同的技术,提出了模糊集合聚类算法的几种变体,以结合基本聚类并获得最终的共识聚类。结果与结论:我们将提出的模糊集合方法应用于白血病,淋巴瘤,腺癌的基因表达分析和黑色素瘤患者,我们将结果与其他最先进的整体方法进行了比较。结果表明,在某些情况下,考虑到数据的自然模糊性,我们可以改善在生物分子水平上定义的患者分类的发现。通过随机投影技术实现的数据维数缩减非常适合高维基因表达数据的特征,因此相对于基于单个模糊k均值和基于重采样技术。此外,我们表明,对基本模糊聚类的准确性和多样性的分析可能有助于解释所提出的模糊集成方法的优缺点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号