Feature Selection in Sparse Matrices

Rahul Kumar; Vatsal Srivastava; Manish Pathak

首页> 外文期刊>Computer Science and Information Technology >Feature Selection in Sparse Matrices

【24h】

Feature Selection in Sparse Matrices

机译：稀疏矩阵中的特征选择

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Feature selection, as a pre-processing step to machine learning, is effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. There are two main approaches for feature selection: wrapper methods, in which the features are selected using the supervised learning algorithm, and filter methods, in which the selection of features is independent of any learning algorithm. However, most of these techniques use feature scoring algorithms that make some basic assumptions about the distribution of the data like normality, balanced distribution of classes, non-sparsity or dense data-set, etc. The data generated in the real world rarely follow such strict criteria. In some cases such as digital advertising, the generated data matrix is actually very sparse and follows no distinct distribution. For this reason, we have come up with a new approach towards feature selection for cases where the data-sets do not follow the above-mentioned assumptions. Our methodology also presents an approach to solve the problem of skewness of data. The efficiency and effectiveness of our methods is then demonstrated by comparison with other well-known techniques of statistics like ANOVA, mutual information, KL divergence, Fisher score, Bayes' error, Chi-square, etc. The data-set used for validation is a real-world user-browsing history data-set used for ad-campaign targeting. It has very high dimensions and is highly sparse as well. Our approach reduces the number of features to a significant degree without compromising on the accuracy of the final predictions.

机译：特征选择作为机器学习的预处理步骤，可有效减少维度，删除不相关数据，提高学习准确性并提高结果的可理解性。有两种主要的特征选择方法：包装器方法（其中使用监督学习算法选择特征）和过滤器方法（其中特征选择独立于任何学习算法）。但是，大多数这些技术使用特征评分算法，这些算法对数据的分布做出一些基本假设，例如正态性，类的平衡分布，非稀疏性或密集数据集等。在现实世界中生成的数据很少遵循这样的假设严格的标准。在某些情况下，例如数字广告，生成的数据矩阵实际上非常稀疏，并且没有明显的分布。因此，对于数据集不遵循上述假设的情况，我们提出了一种新的特征选择方法。我们的方法还提出了一种解决数据偏度问题的方法。然后，通过与其他众所周知的统计技术（如方差分析，互信息，KL散度，Fisher得分，贝叶斯误差，卡方）等进行比较，证明了我们方法的效率和有效性。用于验证的数据集为用于广告系列定位的真实用户浏览历史记录数据集。它具有很高的尺寸，也非常稀疏。我们的方法在不影响最终预测准确性的情况下，在很大程度上减少了特征数量。

著录项

来源
《Computer Science and Information Technology》 |2019年第3期|共7页
作者
Rahul Kumar; Vatsal Srivastava; Manish Pathak;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Feature SelectionSparse MatricesFilters Methods;

机译：特征选择稀疏矩阵过滤器方法;
入库时间 2022-08-18 06:59:01

相似文献

外文文献
中文文献
专利

1. Redundant features removal for unsupervised spectral feature selection algorithms: an empirical study based on nonparametric sparse feature graph [J] . Pengfei Xu, Shuchu Han, Hao Huang, International Journal of Data Science and Analytics . 2019,第1期

机译：无监督频谱特征选择算法的冗余特征去除：基于非参数稀疏特征图的实证研究
2. Redundant features removal for unsupervised spectral feature selection algorithms: an empirical study based on nonparametric sparse feature graph [J] . Pengfei Xu, Shuchu Han, Hao Huang, International Journal of Data Science and Analytics . 2019,第1期

机译：无监督谱特征选择算法的冗余功能拆除：基于非参数稀疏特征图的实证研究
3. Robust Feature Selection with Feature Correlation via Sparse Multi-Label Learning [J] . Pattern recognition and image analysis: advances in mathematical theory and applications in the USSR . 2020,第1期

机译：具有通过稀疏多标签学习的功能相关性的强大功能选择
4. Biomarkers Selection of Abnormal Functional Connections in Schizophrenia with ℓ_(2,1-2)-Norm Based Sparse Regularization Feature Selection Method [C] . Na Gao, Chen Qiao, Shun Qi, International Conference on Intelligent Computing . 2020

机译：生物标志物在精神分裂症中选择异常功能联系，具有ℓ_（2,1-2） - 基于稀疏正则化特征选择方法
5. Reducing Covariate Factors of Gait Recognition Using Feature Selection, Dictionary-Based Sparse Coding, and Deep Learning. [D] . Alotaibi, Munif. 2017

机译：使用特征选择，基于字典的稀疏编码和深度学习减少步态识别的协变量因素。
6. The CSP-Based New Features Plus Non-Convex Log Sparse Feature Selection for Motor Imagery EEG Classification [O] . Shaorong Zhang, Zhibin Zhu, Benxin Zhang, 2020

机译：基于CSP的新功能以及电机图像EEG分类的非凸起日志稀疏功能选择
7. The Relevance Sample-Feature Machine: A Sparse Bayesian Learning Approach to Joint Feature-Sample Selection [O] . Yalda Mohsenzadeh, Hamid Sheikhzadeh, Senior Member, 2014

机译：相关性样本特征机：联合特征样本选择的稀疏贝叶斯学习方法

Feature Selection in Sparse Matrices

摘要

著录项

相似文献

相关主题

期刊订阅