首页> 外文期刊>中国人民解放军军医大学学报(英文版) >A data structure and function classification based method to evaluate clustering models for gene expression data
【24h】

A data structure and function classification based method to evaluate clustering models for gene expression data

机译:基于数据结构和功能分类的基因表达数据聚类模型评估方法

获取原文
获取原文并翻译 | 示例
       

摘要

Objective:To establish a systematic framework for selecting the best clustering algorithm and provide an evaluation method for clustering analyses of gene expression data. Methods: Based on data structure (internal information) and function classification (external information), the evaluation of gene expression data analyses were carried out by using 2 approaches. Firstly, to assess the predictive power of clusteringalgorithms, Entropy was introduced to measure the consistency between the clustering results from different algorithms and the known and validated functional classifications. Secondly, a modified method of figure of merit (adjust-FOM) was used as internal assessment method. In this method, one clustering algorithm was used to analyze all data but one experimental condition, the remaining condition was used to assess the predictive power of the resulting clusters. This method was applied on 3 gene expression data sets (2 from the Lyer's Serum Data Sets, and 1 from the Ferea's Saccharomyces Cerevisiae Data Set). Results: A method based on entropy and figure of merit (FOM) was proposed to explore the results of the 3 data sets obtained by 6 different algorithms, SOM and Fuzzy clustering methods were confirmed to possess the highest ability to cluster. Conclusion: A method based on entropy is firstly brought forward to evaluate clustering analyses.Different results are attained in evaluating same data set due to different function classification. According to the curves of adjust_FOM and Entropy_FOM, SOM and Fuzzy clustering methods show the highest ability to cluster on the 3 data sets.
机译:目的:建立选择最佳聚类算法的系统框架,为基因表达数据的聚类分析提供一种评价方法。方法:根据数据结构(内部信息)和功能分类(外部信息),使用两种方法进行基因表达数据分析的评估。首先,为了评估聚类算法的预测能力,引入了熵来衡量不同算法与已知和经过验证的功能分类的聚类结果之间的一致性。其次,将改进的品质因数方法(adjust-FOM)用作内部评估方法。在这种方法中,一种聚类算法用于分析所有数据,但一种实验条件,其余条件用于评估所得聚类的预测能力。该方法应用于3个基因表达数据集(2个来自Lyer's Serum数据集,1个来自Ferea's Saccharomyces Cerevisiae数据集)。结果:提出了一种基于熵和品质因数(FOM)的方法,探讨了由6种不同算法获得的3个数据集的结果,证实了SOM和Fuzzy聚类方法具有最高的聚类能力。结论:首先提出了一种基于熵的聚类分析方法,由于功能分类不同,对同一数据集进行评估的结果不同。根据adjust_FOM和Entropy_FOM的曲线,SOM和模糊聚类方法在三个数据集上显示出最高的聚类能力。

著录项

  • 来源
  • 作者单位

    Department of Medical Statistics,Third Military Medical University,Chongqing,400038,China;

    Applied Research Centre for Genomics Technology,Department of Biology & Chemistry,City University of Hong Kong,83 Tat Chee Avenue,Kowloon,Hong Kong,China;

    Applied Research Centre for Genomics Technology,Department of Biology & Chemistry,City University of Hong Kong,83 Tat Chee Avenue,Kowloon,Hong Kong,China;

    Department of Electronic Technology,Southwest University of Politics and Law Science,Chongqing, 400031,China;

    Department of Medical Statistics,Third Military Medical University,Chongqing,400038,China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 chi
  • 中图分类 基础医学;
  • 关键词

    gene expression; evaluation of clustering; adjust_FOM; entropy;

    机译:基因表达;聚类评估;FOM调整;熵;
  • 入库时间 2022-08-19 03:48:27
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号