首页> 外文期刊>Statistics and Its Interface >Penalized unsupervised learning with outliers
【24h】

Penalized unsupervised learning with outliers

机译:带有异常值的惩罚性无监督学习

获取原文
获取原文并翻译 | 示例
           

摘要

We consider the problem of performing unsupervised learning in the presence of outliers - that is, observations that do not come from the same distribution as the rest of the data. It is known that in this setting, standard approaches for unsupervised learning can yield unsatisfactory results. For instance, in the presence of severe outliers, K-means clustering will often assign each outlier to its own cluster, or alternatively may yield distorted clusters in order to accommodate the outliers. In this paper, we take a new approach to extending existing unsupervised learning techniques to accommodate outliers. Our approach is an extension of a recent proposal for outlier detection in the regression setting. We allow each observation to take on an "error" term, and we penalize the errors using a group lasso penalty in order to encourage most of the observations' errors to exactly equal zero. We show that this approach can be used in order to develop extensions of K-means clustering and principal components analysis that result in accurate outlier detection, as well as improved performance in the presence of outliers. These methods are illustrated in a simulation study and on two gene expression data sets, and connections with M-estimation are explored.
机译:我们考虑在存在异常值的情况下执行无监督学习的问题-也就是说,观察值与其他数据的分布不同。众所周知,在这种情况下,无监督学习的标准方法可能会产生不令人满意的结果。例如,在存在严重异常值的情况下,K均值聚类通常会将每个异常值分配给它自己的群集,或者可能会产生变形的群集,以适应异常值。在本文中,我们采用一种新方法来扩展现有的无监督学习技术以适应离群值。我们的方法是对回归设置中异常值检测的最新建议的扩展。我们允许每个观察结果都采用“错误”项,并且我们使用组套索罚分对错误进行惩罚,以鼓励大多数观察结果的错误完全等于零。我们表明,可以使用此方法来发展K-means聚类和主成分分析的扩展,从而导致精确的离群值检测以及在存在离群值的情况下提高性能。在模拟研究中和在两个基因表达数据集上说明了这些方法,并探讨了与M估计的联系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号