Fuzzy ensemble clustering based on random projections for DNA microarray data analysis

Roberto Avogadri; Giorgio Valentini

首页> 外文期刊>Artificial intelligence in medicine >Fuzzy ensemble clustering based on random projections for DNA microarray data analysis

【24h】

Fuzzy ensemble clustering based on random projections for DNA microarray data analysis

机译：基于随机投影的模糊集成聚类用于DNA芯片数据分析

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Objective: Two major problems related the unsupervised analysis of gene expression data are represented by the accuracy and reliability of the discovered clusters, and by the biological fact that the boundaries between classes of patients or classes of functionally related genes are sometimes not clearly defined. The main goal of this work consists in the exploration of new strategies and in the development of new clustering methods to improve the accuracy and robustness of clustering results, taking into account the uncertainty underlying the assignment of examples to clusters in the context of gene expression data analysis.rnMethodology: We propose a fuzzy ensemble clustering approach both to improve the accuracy of clustering results and to take into account the inherent fuzziness of biological and bio-medical gene expression data. We applied random projections that obey the Johnson-Lindenstrauss lemma to obtain several instances of lower dimensional gene expression data from the original high-dimensional ones, approximately preserving the information and the metric structure of the original data. Then we adopt a double fuzzy approach to obtain a consensus ensemble clustering, by first applying a fuzzy k-means algorithm to the different instances of the projected low-dimensional data and then by using a fuzzy t-norm to combine the multiple clusterings. Several variants of the fuzzy ensemble clustering algorithms are proposed, according to different techniques to combine the base clusterings and to obtain the final consensus clustering.rnResults and conclusion: We applied our proposed fuzzy ensemble methods to the gene expression analysis of leukemia, lymphoma, adenocarcinoma and melanoma patients, and we compared the results with other state of the art ensemble methods. Results show that in some cases, taking into account the natural fuzziness of the data, we can improve the discovery of classes of patients defined at bio-molecular level. The reduction of the dimension of the data, achieved through random projectionsrntechniques, is well-suited to the characteristics of high-dimensional gene expression data, thus resulting in improved performance with respect to single fuzzy k-means and with respect to ensemble methods based on resampling techniques. Moreover, we show that the analysis of the accuracy and diversity of the base fuzzy clusterings can be useful to explain the advantages and the limitations of the proposed fuzzy ensemble approach.

机译：目的：与基因表达数据的无监督分析有关的两个主要问题由发现的簇的准确性和可靠性以及有时无法明确定义患者类别或功能相关基因类别之间界限的生物学事实所代表。这项工作的主要目标在于探索新的策略和开发新的聚类方法，以提高聚类结果的准确性和鲁棒性，同时考虑到在基因表达数据中将实例分配给聚类的潜在不确定性方法：我们提出了一种模糊集成聚类方法，既可以提高聚类结果的准确性，又可以考虑生物和生物医学基因表达数据固有的模糊性。我们应用了遵循Johnson-Lindenstrauss引理的随机投影，从原始的高维样本中获得了几个低维基因表达数据的实例，大致保留了原始数据的信息和度量结构。然后，我们首先采用模糊k均值算法对投影的低维数据的不同实例应用模糊k均值算法，然后使用模糊t范数组合多个聚类，从而采用双重模糊方法获得共识集合聚类。根据不同的技术，提出了模糊集合聚类算法的几种变体，以结合基本聚类并获得最终的共识聚类。结果与结论：我们将提出的模糊集合方法应用于白血病，淋巴瘤，腺癌的基因表达分析和黑色素瘤患者，我们将结果与其他最先进的整体方法进行了比较。结果表明，在某些情况下，考虑到数据的自然模糊性，我们可以改善在生物分子水平上定义的患者分类的发现。通过随机投影技术实现的数据维数缩减非常适合高维基因表达数据的特征，因此相对于基于单个模糊k均值和基于重采样技术。此外，我们表明，对基本模糊聚类的准确性和多样性的分析可能有助于解释所提出的模糊集成方法的优缺点。

著录项

来源
《Artificial intelligence in medicine》 |2009年第3期|173-183|共11页
作者
Roberto Avogadri; Giorgio Valentini;
展开▼
作者单位

DSI, Dipartimento di Scienze dell' Informazione, Universita degli Studi di Milano, Via Comelico 39, 20135 MHano, Italy;

DSI, Dipartimento di Scienze dell' Informazione, Universita degli Studi di Milano, Via Comelico 39, 20135 MHano, Italy;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
gene expression data clustering; ensemble clustering; fuzzy clustering; random subspace; random projections; DNA microarrays;

机译：基因表达数据聚类;集成聚类;模糊聚类随机子空间随机预测;DNA微阵列;

相似文献

外文文献
中文文献
专利

1. Fuzzy c-Means and Cluster Ensemble with Random Projection for Big Data Clustering [J] . Ye Mao, Liu Wenfen, Wei Jianghong, Mathematical Problems in Engineering . 2016,第pta8期

机译：大数据聚类的具有随机投影的模糊c均值和聚类集成
2. FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data [J] . Limin Fu, Enzo Medico BMC Bioinformatics . 2007,第1期

机译：FLAME，一种用于DNA芯片数据分析的新型模糊聚类方法
3. FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data [J] . Limin Fu, Enzo Medico BMC Bioinformatics . 2007,第1期

机译：FLAME，一种用于DNA芯片数据分析的新型模糊聚类方法
4. Fuzzy Ensemble Clustering for DNA Microarray Data Analysis [C] . Roberto Avogadri, Giorgio Valentini International Workshop on Fuzzy Logic and Applications(WILF 2007); 20070707-10; Camogli(IT) . 2007

机译：DNA芯片数据分析的模糊集成聚类
5. A geometric visualization scheme for fuzzy-clustered DNA microarray data. [D] . Zhang, Yuanquan. 2005

机译：用于模糊聚类DNA微阵列数据的几何可视化方案。
6. FLAME a novel fuzzy clustering method for the analysis of DNA microarray data [O] . Limin Fu, Enzo Medico 2007

机译：FLAME一种用于DNA芯片数据分析的新型模糊聚类方法
7. Fuzzyc-Means and Cluster Ensemble with Random Projection for Big Data Clustering [O] . Mao Ye, Wenfen Liu, Jianghong Wei, 2016

机译：具有大数据聚类随机投影的Fuzzyc-means和Cluster集合
8. Random Coding Bounds for DNA Codes Based on Fibonacci Ensembles of DNA Sequences [R] . D'yachkov, A., Macula, A., Renz, T., 2008

机译：基于DNa序列Fibonacci集合的DNa编码随机编码界

Fuzzy ensemble clustering based on random projections for DNA microarray data analysis

摘要

著录项

相似文献

相关主题

期刊订阅