首页> 外文学位 >Random Subspace Learning on Outlier Detection and Classification with Minimum Covariance Determinant Estimator.
【24h】

Random Subspace Learning on Outlier Detection and Classification with Minimum Covariance Determinant Estimator.

机译:利用最小协方差行列式估计器进行离群值检测和分类的随机子空间学习。

获取原文
获取原文并翻译 | 示例

摘要

The questions brought by high dimensional data is interesting and challenging. Our study is targeting on the particular type of data in this situation that namely "large p, small n". Since the dimensionality is massively larger than the number of observations in the data, any measurement of covariance and its inverse will be miserably affected. The definition of high dimension in statistics has been changed throughout decades. Modern datasets with over thousands of dimensions are demanding the ability to gain deeper understanding but hindered by the curse of dimensionality. We decide to review and explore further to negotiate with the curse and extend previous studies to pave a new way for estimating robustness then apply it to outlier detection and classification.;We explored the random subspace learning and expand other classification and outlier detection algorithms to adapt its framework. Our proposed methods can handle both high-dimension low-sample size and traditional low-dimensional high-sample size datasets. Essentially, we avoid the computational bottleneck of techniques like Minimum Covariance Determinant (MCD) by computing the needed determinants and associated measures in much lower dimensional subspaces. Both theoretical and computational development of our approach reveal that it is computationally more efficient than the regularized methods in high-dimensional low-sample size, and often competes favorably with existing methods as far as the percentage of correct outlier detection are concerned.
机译:高维数据带来的问题既有趣又具有挑战性。我们的研究针对这种情况下的特定数据类型,即“大p,小n”。由于维数远远大于数据中观测值的数目,因此协方差及其逆的任何度量都将受到严重影响。数十年来,统计中的高维定义已经改变。拥有上千个维度的现代数据集要求有能力获得更深入的了解,但受到维度诅咒的阻碍。我们决定审查并进一步探索以与诅咒进行谈判,并扩展先前的研究以为估计鲁棒性提供新方法,然后将其应用于离群值检测和分类。;我们探索了随机子空间学习并扩展了其他分类和离群值检测算法以适应它的框架。我们提出的方法可以处理高维低样本量数据集和传统的低维高样本量数据集。本质上,我们通过在低维子空间中计算所需的行列式和相关度量,避免了诸如最小协方差行列式(MCD)之类的技术的计算瓶颈。我们方法的理论和计算发展都表明,在高维低样本量方面,该方法在计算上比正则化方法更有效,并且就正确的异常值检测百分比而言,它通常与现有方法竞争良好。

著录项

  • 作者

    Liu, Bohan.;

  • 作者单位

    Rochester Institute of Technology.;

  • 授予单位 Rochester Institute of Technology.;
  • 学科 Statistics.
  • 学位 M.S.
  • 年度 2016
  • 页码 94 p.
  • 总页数 94
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 公共建筑;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号