首页> 外文学位 >On Feature Selection Stability: A Data Perspective.
【24h】

On Feature Selection Stability: A Data Perspective.

机译:关于特征选择稳定性:数据透视。

获取原文
获取原文并翻译 | 示例

摘要

The rapid growth in the high-throughput technologies last few decades makes the manual processing of the generated data to be impracticable. Even worse, the machine learning and data mining techniques seemed to be paralyzed against these massive datasets. High-dimensionality is one of the most common challenges for machine learning and data mining tasks. Feature selection aims to reduce dimensionality by selecting a small subset of the features that perform at least as good as the full feature set. Generally, the learning performance, e.g. classification accuracy, and algorithm complexity are used to measure the quality of the algorithm. Recently, the stability of feature selection algorithms has gained an increasing attention as a new indicator due to the necessity to select similar subsets of features each time when the algorithm is run on the same dataset even in the presence of a small amount of perturbation.;In order to cure the selection stability issue, we should understand the cause of instability first. In this dissertation, we will investigate the causes of instability in high-dimensional datasets using well-known feature selection algorithms. As a result, we found that the stability mostly data-dependent. According to these findings, we propose a framework to improve selection stability by solving these main causes. In particular, we found that data noise greatly impacts the stability and the learning performance as well. So, we proposed to reduce it in order to improve both selection stability and learning performance. However, current noise reduction approaches are not able to distinguish between data noise and variation in samples from different classes. For this reason, we overcome this limitation by using Supervised noise reduction via Low Rank Matrix Approximation, SLRMA for short. The proposed framework has proved to be successful on different types of datasets with high-dimensionality, such as microarrays and images datasets. However, this framework cannot handle unlabeled, hence, we propose Local SVD to overcome this limitation.
机译:近几十年来,高通量技术的飞速发展使得人工处理生成的数据变得不可行。更糟糕的是,机器学习和数据挖掘技术似乎被这些庞大的数据集瘫痪了。高维是机器学习和数据挖掘任务的最常见挑战之一。特征选择旨在通过选择性能至少与整个特征集一样好的一小部分特征来减少尺寸。通常,学习表现例如使用分类精度和算法复杂度来衡量算法的质量。最近,特征选择算法的稳定性作为一种新的指标已受到越来越多的关注,这是因为每次算法在相同数据集上运行时,即使在存在少量扰动的情况下,都必须选择相似的特征子集。为了解决选择稳定性问题,我们应该首先了解不稳定的原因。在本文中,我们将使用众所周知的特征选择算法研究高维数据集不稳定的原因。结果,我们发现稳定性主要取决于数据。根据这些发现,我们提出了一个通过解决这些主要原因来提高选择稳定性的框架。特别地,我们发现数据噪声也极大地影响稳定性和学习性能。因此,我们建议减少它以提高选择稳定性和学习性能。但是,当前的降噪方法无法区分数据噪声和不同类别样本中的变化。因此,我们通过使用低秩矩阵近似(SLRMA)进行监督降噪来克服此限制。事实证明,提出的框架在不同类型的高维数据集(例如微阵列和图像数据集)上是成功的。但是,此框架无法处理未标记的内容,因此,我们建议使用本地SVD来克服此限制。

著录项

  • 作者

    Alelyani, Salem.;

  • 作者单位

    Arizona State University.;

  • 授予单位 Arizona State University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 137 p.
  • 总页数 137
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号