首页> 外文会议>2016 Future Technologies Conference >A feature grouping method for ensemble clustering of high-dimensional genomic big data
【24h】

A feature grouping method for ensemble clustering of high-dimensional genomic big data

机译:高维基因组大数据集成聚类的特征分组方法

获取原文
获取原文并翻译 | 示例

摘要

High-dimensional genomic big data with hundred of features present a big challenge in cluster analysis. Usually, genomic data are noisy and have correlation among the features. Also, different subspaces exist in high-dimensional genomic data. This paper presents a feature selecting and grouping method for ensemble clustering of high-dimensional genomic data. Two most popular clustering methods: k-means and similarity-based clustering are used for ensemble clustering. Ensemble clustering is more effective in clustering high-dimensional complex data than the traditional clustering algorithms. In this paper, we cluster un-labeled genomic data (148 Exome data sets) of Brugada syndrome from the Centre of Medical Genetics, VUB UZ Brussel using SimpleKMeans, XMeans, DBScan, and MakeDensityBasedCluster algorithms and compare the clustering results with proposed ensemble clustering method. Furthermore, we use biclustering (δ-Biclustering) algorithm on each cluster to find the sub-matrices in the genomic data, which clusters both instances and features simultaneously.
机译:具有数百个特征的高维基因组大数据给聚类分析带来了巨大挑战。通常,基因组数据是嘈杂的,并且在特征之间具有相关性。同样,高维基因组数据中存在不同的子空间。本文提出了一种高维基因组数据集成聚类的特征选择与分组方法。两种最流行的聚类方法:k均值和基于相似度的聚类用于集合聚类。集成聚类在聚类高维复杂数据方面比传统聚类算法更有效。在本文中,我们使用SimpleKMeans,XMeans,DBScan和MakeDensityBasedCluster算法对来自VUB UZ布鲁塞尔医学遗传学中心的Brugada综合征的未标记基因组数据(148个外显子组数据集)进行聚类,并将聚类结果与建议的集成聚类方法进行比较。此外,我们在每个聚类上使用biclustering(δ-Biclustering)算法在基因组数据中找到子矩阵,从而同时对实例和特征进行聚类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号