On Feature Selection Stability: A Data Perspective.

机译：关于特征选择稳定性：数据透视。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The rapid growth in the high-throughput technologies last few decades makes the manual processing of the generated data to be impracticable. Even worse, the machine learning and data mining techniques seemed to be paralyzed against these massive datasets. High-dimensionality is one of the most common challenges for machine learning and data mining tasks. Feature selection aims to reduce dimensionality by selecting a small subset of the features that perform at least as good as the full feature set. Generally, the learning performance, e.g. classification accuracy, and algorithm complexity are used to measure the quality of the algorithm. Recently, the stability of feature selection algorithms has gained an increasing attention as a new indicator due to the necessity to select similar subsets of features each time when the algorithm is run on the same dataset even in the presence of a small amount of perturbation.;In order to cure the selection stability issue, we should understand the cause of instability first. In this dissertation, we will investigate the causes of instability in high-dimensional datasets using well-known feature selection algorithms. As a result, we found that the stability mostly data-dependent. According to these findings, we propose a framework to improve selection stability by solving these main causes. In particular, we found that data noise greatly impacts the stability and the learning performance as well. So, we proposed to reduce it in order to improve both selection stability and learning performance. However, current noise reduction approaches are not able to distinguish between data noise and variation in samples from different classes. For this reason, we overcome this limitation by using Supervised noise reduction via Low Rank Matrix Approximation, SLRMA for short. The proposed framework has proved to be successful on different types of datasets with high-dimensionality, such as microarrays and images datasets. However, this framework cannot handle unlabeled, hence, we propose Local SVD to overcome this limitation.

机译：近几十年来，高通量技术的飞速发展使得人工处理生成的数据变得不可行。更糟糕的是，机器学习和数据挖掘技术似乎被这些庞大的数据集瘫痪了。高维是机器学习和数据挖掘任务的最常见挑战之一。特征选择旨在通过选择性能至少与整个特征集一样好的一小部分特征来减少尺寸。通常，学习表现例如使用分类精度和算法复杂度来衡量算法的质量。最近，特征选择算法的稳定性作为一种新的指标已受到越来越多的关注，这是因为每次算法在相同数据集上运行时，即使在存在少量扰动的情况下，都必须选择相似的特征子集。为了解决选择稳定性问题，我们应该首先了解不稳定的原因。在本文中，我们将使用众所周知的特征选择算法研究高维数据集不稳定的原因。结果，我们发现稳定性主要取决于数据。根据这些发现，我们提出了一个通过解决这些主要原因来提高选择稳定性的框架。特别地，我们发现数据噪声也极大地影响稳定性和学习性能。因此，我们建议减少它以提高选择稳定性和学习性能。但是，当前的降噪方法无法区分数据噪声和不同类别样本中的变化。因此，我们通过使用低秩矩阵近似（SLRMA）进行监督降噪来克服此限制。事实证明，提出的框架在不同类型的高维数据集（例如微阵列和图像数据集）上是成功的。但是，此框架无法处理未标记的内容，因此，我们建议使用本地SVD来克服此限制。

著录项

作者
Alelyani, Salem.;
展开▼
作者单位

Arizona State University.;

展开▼
授予单位 Arizona State University.;
学科 Computer Science.
学位 Ph.D.
年度 2013
页码 137 p.
总页数 137
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains [J] . Pes Barbara Neural computing & applications . 2020,第10期

机译：用于高维数据的合奏功能选择：多个域的稳定性分析
2. Stability of feature selection in classification issues for high-dimensional correlated data [J] . Perthame Emeline, Friguet Chloe, Causeur David Statistics and computing . 2016,第4期

机译：高维相关数据分类问题中特征选择的稳定性
3. Analysis of feature selection stability on high dimension and small sample data [J] . David Dernoncourt, Blaise Hanczar, Jean-Daniel Zucker Computational statistics & data analysis . 2014,第Null期

机译：高维小样本数据特征选择稳定性分析
4. Adjusted Measures for Feature Selection Stability for Data Sets with Similar Features [C] . Andrea Bommert, Joerg Rahnenfuehrer International Conference on Machine Learning, Optimization, and Data Science . 2020

机译：具有类似功能的数据集的特征选择稳定性的调整措施
5. Stability analysis of feature selection approaches with low quality data. [D] . Altidor, Wilker. 2011

机译：具有低质量数据的特征选择方法的稳定性分析。
6. Measuring Stability of Feature Selection in Biomedical Datasets [O] . Jonathan L. Lustgarten, Vanathi Gopalakrishnan, Shyam Visweswaran 2009

机译：测量生物医学数据集中特征选择的稳定性
7. Reconsideration of in-silico siRNA design based on feature selection: a cross-platform data integration perspective. [O] . Qi Liu, Han Zhou, Juan Cui, 2012

机译：基于特征选择重新考虑in-silico siRNa设计：跨平台数据集成视角。

On Feature Selection Stability: A Data Perspective.

摘要

著录项

相似文献

相关主题

期刊订阅