首页> 外文期刊>Theoretical computer science >Algorithmic paradigms for stability-based cluster validity and model selection statistical methods, with applications to microarray data analysis
【24h】

Algorithmic paradigms for stability-based cluster validity and model selection statistical methods, with applications to microarray data analysis

机译:基于稳定性的聚类有效性和模型选择统计方法的算法范例及其在微阵列数据分析中的应用

获取原文
获取原文并翻译 | 示例
       

摘要

The advent of high throughput technologies, in particular microarrays, for biological research has revived interest in clustering, resulting in a plethora of new clustering algorithms. However, model selection, i.e., the identification of the correct number of clusters in a dataset, has received relatively little attention. Indeed, although central for statistics, its difficulty is also well known. Fortunately, a few novel techniques for model selection, representing a sharp departure from previous ones in statistics, have been proposed and gained prominence for microarray data analysis. Among those, the stability-based methods are the most robust and best performing in terms of prediction, but the slowest in terms of time. It is very unfortunate that as fascinating and classic an area of statistics as model selection, with important practical applications, has received very little attention in terms of algorithmic design and engineering. In this paper, in order to partially fill this gap, we make the following contributions: (A) the first general algorithmic paradigm for stability-based methods for model selection; (B) reductions showing that all of the known methods in this class are an instance of the proposed paradigm; (C) a novel algorithmic paradigm for the class of stability-based methods for cluster validity, i.e., methods assessing how statistically significant is a given clustering solution; (D) a general algorithmic paradigm that describes heuristic and very effective speed-ups known in the literature for stability-based model selection methods. Since the performance evaluation of model selection algorithms is mainly experimental, we offer, for completeness and without even attempting to be exhaustive, a representative synopsis of known experimental benchmarking results that highlight the ability of stability-based methods for model selection and the computational resources that they require for the task. As a whole, the contributions of this paper generalize in several respects reference methodologies in statistics and show that algorithmic approaches can yield deep methodological insights into this area, in addition to practical computational procedures.
机译:用于生物学研究的高通量技术特别是微阵列的出现引起了人们对聚类的兴趣,从而产生了许多新的聚类算法。但是,模型选择,即,识别数据集中正确数目的聚类,很少受到关注。确实,尽管统计很重要,但其难度也是众所周知的。幸运的是,已经提出了一些新的模型选择技术,这些技术与以前的统计方法大相径庭,已在微阵列数据分析中获得了突出的应用。在这些方法中,就预测而言,基于稳定性的方法最可靠,性能最佳,但在时间方面则最慢。不幸的是,模型选择具有令人着迷的经典统计意义,并且具有重要的实际应用,在算法设计和工程方面却很少受到关注。在本文中,为了部分填补这一空白,我们做出了以下贡献:(A)第一个基于稳定性的模型选择方法的通用算法范式; (B)归纳表明此类中的所有已知方法都是所提议范式的一个实例; (C)一种新的算法范式,用于基于有效性的聚类有效性方法分类,即评估给定聚类解决方案在统计上的重要性的方法; (D)描述基于稳定性的模型选择方法的文献中已知的启发式和非常有效的加速的通用算法范式。由于模型选择算法的性能评估主要是实验性的,因此出于完整性甚至不试图穷举的目的,我们提供了已知实验基准测试结果的代表性摘要,强调了基于稳定性的模型选择方法和计算资源的能力。他们需要完成任务。总体而言,本文的贡献概括了统计学中的参考方法,并显示了除实用的计算程序外,算法方法还可以对该领域产生深刻的方法学见解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号