首页> 外文期刊>International Journal of Computer Science and Security >Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Expression Data
【24h】

Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Expression Data

机译:使用并行模糊方法进行二聚化分析微阵列基因表达数据

获取原文
           

摘要

Biclusters are required to analyzing gene expression patterns of genes comparing rows in expression profiles and analyzing expression profiles of samples by comparing columns in gene expression matrix. In the process of biclustering we need to cluster genes and samples. The algorithm presented in this paper is based upon the two-way clustering approach in which the genes and samples are clustered using parallel fuzzy C-means clustering using message passing interface, we call it MFCM. MFCM applied for clustering on genes and samples which maximize membership function values of the data set. It is a parallelized rework of a parallel fuzzy two-way clustering algorithm for microarray gene expression data [9], to study the efficiency and parallelization improvement of the algorithm. The algorithm uses gene entropy measure to filter the clustered data to find biclusters. The method is able to get highly correlated biclusters of the gene expression dataset.We have implemented the algorithm of fuzzy c-means in MATLAB parallel computing platform using MATLABMPI (Message Passing Version of MATLAB). This approach is used to find biclusters of gene expression matrices. The biclustering method is also parallelized to reduce the gene centers with lower entropy filter function. By this function we choose the gene cluster centers with minimum entropy. The algorithm is tested on well-known cell cycle of the budding yeast S. cerevisiae by Cho et al. and Tavazoi et.al data sets, breast cancer subtypes Basal A, Basal B and Leukemia from Golub et al.
机译:需要使用分类器来分析基因的基因表达模式,以比较表达谱中的行,并通过比较基因表达矩阵中的列来分析样品的表达谱。在双重聚类过程中,我们需要对基因和样本进行聚类。本文提出的算法基于双向聚类方法,在该方法中,通过消息传递接口使用并行模糊C均值聚类对基因和样本进行聚类,我们称之为MFCM。 MFCM应用于对基因和样本进行聚类,以最大化数据集的隶属函数值。研究微阵列基因表达数据的并行模糊双向聚类算法[9]的并行重做,以研究该算法的效率和并行化改进。该算法使用基因熵测度来过滤聚类数据以找到双聚类。该方法能够获得高度相关的基因表达数据集。我们在MATLAB并行计算平台中使用MATLABMPI(MATLAB的消息传递版本)实现了模糊c均值算法。此方法用于查找基因表达矩阵的二聚体。双聚类方法也被并行化以减少具有较低熵过滤功能的基因中心。通过此功能,我们选择具有最小熵的基因簇中心。该算法在Cho等人的啤酒酵母新芽中众所周知的细胞周期上进行了测试。和Tavazoi等人的数据集,来自Golub等人的乳腺癌亚型A,Basal B和白血病。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号