首页> 外文会议>2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops >Multi-source kernel k-means for clustering heterogeneous biomedical data
【24h】

Multi-source kernel k-means for clustering heterogeneous biomedical data

机译:用于异构生物医学数据聚类的多源核k均值

获取原文

摘要

In recent years, huge different biological databases have been stored in various locations. Using distinct data sets from multiple sources results in more reliable data analysis. However, it is so difficult to combine heterogeneous data in one single server. The most obvious reasons include data privacy, large data sizes, costs, and different geographical locations of data sources. In this paper, we present two new algorithms for clustering data from multiple remote data sources using kernel k-means. The first algorithm is the center-based algorithm built on k-means algorithm. The second algorithm uses distributed kernel k-means over multiple data sources. In the distributed scheme, clustering methods are executed only on their local data sources themselves. Partial clustering results are synched between data sources. To evaluate performance of our proposed algorithms, we merged all data from different sources into one large data set to perform kernel k-means. The results showed that our center-based algorithm greatly reduced transmission data between data sources while still yielding acceptable clustering results. Our distributed kernel k-means algorithm achieved even better performance. The clustering results are very close to those generated by kernel k-means on one merged data set.
机译:近年来,巨大的不同生物学数据库已存储在各个位置。使用来自多个来源的不同数据集可以使数据分析更加可靠。但是,很难在一个服务器中组合异构数据。最明显的原因包括数据隐私,大数据量,成本以及数据源的地理位置不同。在本文中,我们提出了两种使用内核k均值对来自多个远程数据源的数据进行聚类的新算法。第一种算法是基于k均值算法的基于中心的算法。第二种算法在多个数据源上使用分布式内核k均值。在分布式方案中,群集方法仅在其本地数据源本身上执行。在数据源之间同步部分聚类结果。为了评估我们提出的算法的性能,我们将来自不同来源的所有数据合并为一个大数据集,以执行内核k均值。结果表明,我们的基于中心的算法大大减少了数据源之间的传输数据,同时仍然产生了可接受的聚类结果。我们的分布式内核k均值算法实现了更好的性能。聚类结果非常接近于内核k均值在一个合并数据集上生成的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号