【24h】

A method to identify significant clusters in gene expression data

机译:一种识别基因表达数据中重要簇的方法

获取原文
获取原文并翻译 | 示例

摘要

Clustering algorithms have been widely applied to gene expression data. For both hierarchical and partitioning clustering algorithms, selecting the number of significant clusters is an important problem and many methods have been proposed. Existing methods for selecting the number of clusters tend to find only the global patterns in the data (e.g.: the over and under expressed genes). We have noted the need for a better method in the gene expression context, where small, biologically meaningful clusters can be difficult to identify. In this paper, we define a new criteria, Mean Split Silhouette (MSS), which is a measure of cluster heterogeneity. We propose to choose the number of clusters as the minimizer of MSS. In this way, the number of significant clusters is defined as that which produces the most homogeneous clusters. The power of this method compared to existing methods is demonstrated on simulated microarray data. The minimum MSS method is an example of a general approach that can be applied to any clustering routine with any global criteria.
机译:聚类算法已广泛应用于基因表达数据。对于分层聚类算法和分区聚类算法,选择有效聚类的数量是一个重要问题,并且已经提出了许多方法。用于选择簇数的现有方法趋向于仅在数据中找到整体模式(例如:过度表达和表达不足的基因)。我们已经注意到在基因表达的情况下需要一种更好的方法,在这种情况下,难以识别小的具有生物学意义的簇。在本文中,我们定义了一个新的标准,均值分割轮廓(MSS),它是对群集异质性的一种度量。我们建议选择群集数量作为MSS的最小化方法。这样,有效簇的数量被定义为产生最均匀簇的数量。在模拟微阵列数据上证明了该方法与现有方法相比的强大功能。最小MSS方法是通用方法的示例,可以将其应用于具有任何全局条件的任何聚类例程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号