首页> 外文期刊>Expert Systems with Application >Sparse Proximal Support Vector Machines for feature selection in high dimensional datasets
【24h】

Sparse Proximal Support Vector Machines for feature selection in high dimensional datasets

机译:稀疏近邻支持向量机,用于高维数据集中的特征选择

获取原文
获取原文并翻译 | 示例
           

摘要

Classification of High Dimension Low Sample Size (HDLSS) datasets is a challenging task in supervised learning. Such datasets are prevalent in various areas including biomedical applications and business analytics. In this paper, a new embedded feature selection method for HDLSS datasets is introduced by incorporating sparsity in Proximal Support Vector Machines (PSVMs). Our method, called Sparse Proximal Support Vector Machines (sPSVMs), learns a sparse representation of PSVMs by first casting it as an equivalent least squares problem and then introducing the l(1)-norm for sparsity. An efficient algorithm based on alternating optimization techniques is proposed. sPSVMs remove more than 98% of features in many high dimensional datasets without compromising on generalization performance. Stability in the feature selection process of sPSVMs is also studied and compared with other univariate filter techniques. Additionally, sPSVMs offer the advantage of interpreting the selected features in the context of the classes by inducing class-specific local sparsity instead of global sparsity like other embedded methods. sPSVMs appear to be robust with respect to data dimensionality. Moreover, sPSVMs are able to perform feature selection and classification in one step, eliminating the need for dimensionality reduction on the data. To that end, sPSVMs can be used for preprocessing free classification tasks. (C) 2015 Elsevier Ltd. All rights reserved.
机译:高维低样本量(HDLSS)数据集的分类是监督学习中的一项艰巨任务。这样的数据集在包括生物医学应用和业务分析在内的各个领域中普遍存在。本文通过将稀疏性纳入近邻支持向量机(PSVM)中,介绍了一种用于HDLSS数据集的新的嵌入式特征选择方法。我们的方法称为稀疏近距离支持向量机(sPSVM),它通过首先将PSVM转换为等效的最小二乘问题,然后引入稀疏性的l(1)范数来学习PSVM的稀疏表示。提出了一种基于交替优化技术的高效算法。 sPSVM可以删除许多高维数据集中超过98%的特征,而不会影响泛化性能。还研究了sPSVM的特征选择过程中的稳定性,并将其与其他单变量过滤技术进行了比较。此外,sPSVM具有通过在类的上下文中诱导特定于类的局部稀疏性而不是像其他嵌入式方法那样全局性稀疏性来解释所选功能的优势。 sPSVM在数据维度方面似乎很健壮。而且,sPSVM能够一步执行特征选择和分类,从而无需减少数据的维数。为此,可以将sPSVM用于预处理免费分类任务。 (C)2015 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号