首页> 外文会议>Computer and Computational Sciences (IMSCCS), 2007 Second International Multisymposium on >A Genetic Weighted K-means Algorithm for Clustering GeneExpression Data
【24h】

A Genetic Weighted K-means Algorithm for Clustering GeneExpression Data

机译:基于遗传加权K-均值的基因表达数据聚类算法

获取原文

摘要

The traditional (unweighted) k-means is one of the most popular clustering methods for analyzing gene expression data. However, it suffers three major shortcomings. It is sensitive to initial partitions, its result is prone to the local minima, and it is only applicable to data with spherical-shape clusters. The last shortcoming means that we must assume that gene expression data at the different conditions follow the independent distribution with the same variances. Nevertheless, this assumption is not true in practice. In this paper, we propose a genetic weighted K-means algorithm (denoted by GWKMA), which solves the first two problems and partially remedies the third one. GWKMA is a hybridization of a genetic algorithm (GA) and a weighted K-means algorithm (WKMA). In GWKMA, each individual is encoded by a partitioning table which uniquely determines a clustering, and three genetic operators (selection, crossover, mutation) and a WKM operator derived from WKMA are employed. The superiority of the GWKMA over the k-means is illustrated on a synthetic and two real-life gene expression datasets. Keywords: Weighted k-means, clustering, partitional string, genetic algorithm, gene expression data
机译:传统的(未加权的)k均值是用于分析基因表达数据的最受欢迎的聚类方法之一。但是,它具有三个主要缺点。它对初始分区敏感,其结果易于出现局部最小值,并且仅适用于球形簇的数据。最后一个缺点意味着我们必须假设不同条件下的基因表达数据遵循具有相同方差的独立分布。但是,这种假设在实践中是不正确的。在本文中,我们提出了一种遗传加权的K均值算法(用GWKMA表示),该算法解决了前两个问题,并部分补救了第三个问题。 GWKMA是遗传算法(GA)和加权K均值算法(WKMA)的混合体。在GWKMA中,每个人均由唯一确定聚类的分区表编码,并采用了三个遗传算子(选择,交叉,突变)和源自WKMA的WKM算子。 GWKMA相对于k均值的优越性在合成和两个现实生活中的基因表达数据集上得到了说明。关键词:加权k均值,聚类,分区字符串,遗传算法,基因表达数据

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号