Penalized unsupervised learning with outliers

DANIELA M. WITTEN

首页> 外文期刊>Statistics and Its Interface >Penalized unsupervised learning with outliers

【24h】

Penalized unsupervised learning with outliers

机译：带有异常值的惩罚性无监督学习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider the problem of performing unsupervised learning in the presence of outliers - that is, observations that do not come from the same distribution as the rest of the data. It is known that in this setting, standard approaches for unsupervised learning can yield unsatisfactory results. For instance, in the presence of severe outliers, K-means clustering will often assign each outlier to its own cluster, or alternatively may yield distorted clusters in order to accommodate the outliers. In this paper, we take a new approach to extending existing unsupervised learning techniques to accommodate outliers. Our approach is an extension of a recent proposal for outlier detection in the regression setting. We allow each observation to take on an "error" term, and we penalize the errors using a group lasso penalty in order to encourage most of the observations' errors to exactly equal zero. We show that this approach can be used in order to develop extensions of K-means clustering and principal components analysis that result in accurate outlier detection, as well as improved performance in the presence of outliers. These methods are illustrated in a simulation study and on two gene expression data sets, and connections with M-estimation are explored.

机译：我们考虑在存在异常值的情况下执行无监督学习的问题-也就是说，观察值与其他数据的分布不同。众所周知，在这种情况下，无监督学习的标准方法可能会产生不令人满意的结果。例如，在存在严重异常值的情况下，K均值聚类通常会将每个异常值分配给它自己的群集，或者可能会产生变形的群集，以适应异常值。在本文中，我们采用一种新方法来扩展现有的无监督学习技术以适应离群值。我们的方法是对回归设置中异常值检测的最新建议的扩展。我们允许每个观察结果都采用“错误”项，并且我们使用组套索罚分对错误进行惩罚，以鼓励大多数观察结果的错误完全等于零。我们表明，可以使用此方法来发展K-means聚类和主成分分析的扩展，从而导致精确的离群值检测以及在存在离群值的情况下提高性能。在模拟研究中和在两个基因表达数据集上说明了这些方法，并探讨了与M估计的联系。

著录项

来源
《Statistics and Its Interface》 |2013年第2期|共11页
作者
DANIELA M. WITTEN;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计量学;
关键词
Robust; Group lasso; Clustering; Principal components analysis; M-estimation;

机译：鲁棒;套索;聚类;主成分分析;M估计;

相似文献

外文文献
中文文献
专利

1. Penalized unsupervised learning with outliers [J] . DANIELA M. WITTEN Statistics and Its Interface . 2013,第2期

机译：带有异常值的惩罚性无监督学习
2. Unsupervised word sense induction using rival penalized competitive learning [J] . Yanzhou Huang, Xiaodong Shi, Jinsong Su, Engineering Applications of Artificial Intelligence . 2015,第may期

机译：使用竞争对手的惩罚性竞争学习进行无监督的词义归纳
3. Generative Adversarial Active Learning for Unsupervised Outlier Detection [J] . Liu Yezheng, Li Zhe, Zhou Chong, IEEE Transactions on Knowledge and Data Engineering . 2020,第8期

机译：无监督异常检测的生成对抗性积极学习
4. Unsupervised and supervised classifications by rival penalized competitive learning [C] . Lei Xu, Krzyzak, A. . 1992

机译：竞争对手惩罚性竞争学习的无监督分类
5. A Unified Approach to Data Transformation and Outlier Detection using Penalized Assessment [D] . Guo, Wei. 2014

机译：使用惩罚性评估的数据转换和离群值检测的统一方法
6. Penalized unsupervised learning with outliers [O] . Daniela M. Witten -1

机译：以异常值惩罚无人监督的学习
7. uniForest: an unsupervised machine learning technique to detect outliers and restrict variance in microbiome studies [O] . R.J. Leigh, R.A. Murphy, F. Walsh 2021

机译：Uniforest：一种无监督的机器学习技术，用于检测异常值并限制微生物组研究中的差异

Penalized unsupervised learning with outliers

摘要

著录项

相似文献

相关主题

期刊订阅