首页> 外文期刊>The Annals of Statistics: An Official Journal of the Institute of Mathematical Statistics >Breakdown points for maximum likelihood estimators of location-scale mixtures
【24h】

Breakdown points for maximum likelihood estimators of location-scale mixtures

机译:位置尺度混合的最大似然估计的分解点

获取原文
获取原文并翻译 | 示例
       

摘要

ML-estimation based on mixtures of Normal distributions is a widely used tool for cluster analysis. However, a single outlier can make the parameter estimation of at least one of the mixture components break down. Among others, the estimation of mixtures of t-distributions by McLachlan and Peel [Finite Mixture Models (2000) Wiley, New York] and the addition of a further mixture component accounting for "noise" by Fraley and Raftery [The Computer J. 41 (1998) 578-588] were suggested as more robust alternatives. In this paper, the definition of an adequate robustness measure for cluster analysis is discussed and bounds for the breakdown points of the mentioned methods are given. It turns out that the two alternatives, while adding stability in the presence of outliers of moderate size, do not possess a substantially better breakdown behavior than estimation based on Normal mixtures. If the number of clusters s is treated as fixed, r additional points suffice for all three methods to let the parameters of r clusters explode. Only in the case of r = s is this not possible for t-mixtures. The ability to estimate the number of mixture components, for example, by use of the Bayesian information criterion of Schwarz [Ann. Statist. 6 (1978) 461-464], and to isolate gross outliers as clusters of one point, is crucial for all improved breakdown behavior of all three techniques. Furthermore, a mixture of Normals with an improper uniform distribution is proposed to achieve more robustness in the case of a fixed number of components.
机译:基于正态分布混合的ML估计是一种广泛用于聚类分析的工具。然而,单个离群值可以使至少一种混合成分的参数估计崩溃。其中,麦克拉克兰(McLachlan)和皮尔(Peel)估计t分布的混合[Finite Mixture Models(2000),纽约威利],以及弗雷利和拉夫蒂(Fraley and Raftery)添加了另外一个考虑“噪声”的混合成分[计算机杂志41] (1998)578-588]被提出作为更可靠的替代方案。在本文中,讨论了用于聚类分析的适当鲁棒性度量的定义,并给出了上述方法的崩溃点的界限。事实证明,这两种选择在存在中等大小的异常值时增加了稳定性,但没有比基于普通混合物进行估计的击穿性能好得多。如果将簇数s视为固定,则对于这三种方法,r个附加点就足以使r簇的参数爆炸。仅在r = s的情况下,这对于t混合物是不可能的。估计混合成分数量的能力,例如,通过使用Schwarz的贝叶斯信息准则[Ann。统计员。 6(1978)461-464],以及将总体异常值隔离为一个点的群集,对于所有这三种技术的所有改进的故障行为至关重要。此外,建议使用具有不适当均匀分布的法线的混合,以在组件数量固定的情况下实现更高的鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号