首页> 外文会议>Future Technologies Conference >A feature grouping method for ensemble clustering of high-dimensional genomic big data
【24h】

A feature grouping method for ensemble clustering of high-dimensional genomic big data

机译:高维基因组大数据集群的特征分组方法

获取原文

摘要

High-dimensional genomic big data with hundred of features present a big challenge in cluster analysis. Usually, genomic data are noisy and have correlation among the features. Also, different subspaces exist in high-dimensional genomic data. This paper presents a feature selecting and grouping method for ensemble clustering of high-dimensional genomic data. Two most popular clustering methods: k-means and similarity-based clustering are used for ensemble clustering. Ensemble clustering is more effective in clustering high-dimensional complex data than the traditional clustering algorithms. In this paper, we cluster un-labeled genomic data (148 Exome data sets) of Brugada syndrome from the Centre of Medical Genetics, VUB UZ Brussel using SimpleKMeans, XMeans, DBScan, and MakeDensityBasedCluster algorithms and compare the clustering results with proposed ensemble clustering method. Furthermore, we use biclustering (δ-Biclustering) algorithm on each cluster to find the sub-matrices in the genomic data, which clusters both instances and features simultaneously.
机译:具有百分特征的高维基因组大数据在集群分析中具有大量挑战。通常,基因组数据是嘈杂的并且在特征之间具有相关性。此外,不同的子空间存在于高维基因组数据中。本文介绍了用于高维基因组数据的集群的功能选择和分组方法。两个最流行的聚类方法:K-means和相似性的群集用于集群。合奏聚类在聚类高维复杂数据方面比传统聚类算法更有效。在本文中,我们使用SimpleKMeans,XMeans,DBSCAN和MakeDensityBasedCluster算法群集从医学遗传学,VUB UZ布鲁塞尔的中心Brugada综合征的未标记的基因组数据(148外显子组的数据集)和聚类的结果与提出的合奏聚类方法比较。此外,我们在每个群集中使用Biclustering(Δ-BiClustering)算法在基因组数据中找到子矩阵,该数据在基因组数据中同时群体群体和特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号