A novel robust kernel for classifying high-dimensional data using Support Vector Machines

Hussain Syed Fawad

首页> 外文期刊>Expert Systems with Application >A novel robust kernel for classifying high-dimensional data using Support Vector Machines

【24h】

A novel robust kernel for classifying high-dimensional data using Support Vector Machines

机译：使用支持向量机对高维数据进行分类的新型鲁棒内核

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper presents a new semantic kernel for classification of high-dimensional data in the framework of Support Vector Machines (SVM). SVMs have gained widespread application due to their relatively higher accuracy. The efficacy of SVMs, however, depends upon the separation of the data itself as well as the kernel function. Text data, for instance, is difficult to classify due to synonymy and polysemy in its contents, having multi-topical instances that can result in mislabeling, and being highly sparse in the bag-of-words representation. While the soft margin parameter and kernel tricks are used in SVM to deal with outliers and non-linearly separable data, using data statistics and correlation has not been fully explored in the literature. This paper explore the use co-similarity (i.e., soft co-clustering) to find latent relationships between documents motivated by the success of co-clustering and subspace clustering methods. It has been shown that the use of weighted higher-order paths between instances in the data can be a good measure of similarity values which can then be used for both classification and to correct mislabeled (or outlier) data in the training set. The proposed kernel is generic in nature and suitable for sparse, dyadic data where direct co-occurrences are not necessary common as in the case of textual data, link-analysis in social media networks, co-authorship, etc. It also studies the impact of noise in the training data and provides a technique to re-label such instances. It is also observed that re-labelling of selected training data reduces the adverse effect of outliers or label noise and can greatly improve the classification of the test data. To the best of our knowledge, we are the first to introduce a supervised co-similarity based kernel function and also provide mathematical formulation to show that it is a valid Mercer's kernel. Our experiments show that the proposed framework outperforms current and state-of-the-art methods in terms of classification accuracy and is more resilient to label noise. (C) 2019 Elsevier Ltd. All rights reserved.

机译：本文提出了一种在支持向量机（SVM）框架中用于高维数据分类的新语义内核。 SVM由于其相对较高的精度而获得了广泛的应用。但是，SVM的功效取决于数据本身的分离以及内核功能。例如，文本数据由于其内容中的同义词和多义性而难以分类，具有可能导致标签错误的多主题实例，并且在词袋表示中非常稀疏。尽管在SVM中使用软裕度参数和内核技巧来处理离群值和非线性可分离数据，但是在文献中尚未充分探讨使用数据统计和相关性。本文探讨了使用共相似性（即软共聚）来发现由于共聚和子空间聚类方法的成功而导致的文档之间的潜在关系。已经表明，在数据中的实例之间使用加权的高阶路径可以很好地度量相似度值，然后可以将其用于分类并纠正训练集中错误标记（或异常值）的数据。拟议的内核本质上是通用的，适用于稀疏，二元数据，在这些数据中不需要直接共现，例如文本数据，社交媒体网络中的链接分析，共同作者等情况。它还研究了影响训练数据中的噪声，并提供了一种重新标记此类实例的技术。还观察到，对选定训练数据进行重新标记可以减少异常值或标记噪声的不利影响，并且可以大大改善测试数据的分类。据我们所知，我们是第一个引入基于监督的基于相似度的核函数，并且还提供了数学公式来表明它是有效的Mercer核。我们的实验表明，提出的框架在分类准确度方面胜过当前和最新的方法，并且对标签噪声的适应能力更强。（C）2019 Elsevier Ltd.保留所有权利。

著录项

来源
《Expert Systems with Application》 |2019年第10期|116-131|共16页
作者
Hussain Syed Fawad;
展开▼
作者单位

Ghulam Ishaq Khan Inst Engn Sci & Technol, Machine Learning & Data Sci MDS Lab, Fac Comp Sci & Engn, Topi, Pakistan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Semantic kernels; Support Vector Machines; Co-clustering; Label noise;

机译：语义内核;支持向量机;共聚;标签噪声;

相似文献

外文文献
中文文献
专利

1. A novel robust kernel for classifying high-dimensional data using Support Vector Machines [J] . Hussain Syed Fawad Expert systems with applications . 2019,第Octa期

机译：用于使用支持向量机进行分类高维数据的新型鲁棒内核
2. New Variable Selection Method Using Interval Segmentation Purity with Application to Blockwise Kernel Transform Support Vector Machine Classification of High-Dimensional Microarray Data [J] . Tang Li-Juan, Du Wen, Fu Hai-Yan, Journal of chemical information and modeling . 2009,第8期

机译：区间分割纯度的变量选择新方法在高维微阵列数据分块核变换支持向量机分类中的应用
3. Robust support vector machine for high-dimensional imbalanced data [J] . Nakayama Yugo Communications in Statistics . 2021,第5a6期

机译：高维不平衡数据的强大支持向量机
4. Kernel Independent Component Analysis-Based Prediction on the Protein O-Glycosylation Sites Using Support Vectors Machine and Ensemble Classifiers [C] . Zehao Chen International conference on advanced intelligent computing theories and applications . 2015

机译：使用支持向量机和集成分类器对蛋白质O-糖基化位点进行基于核独立成分分析的预测
5. Machine Learning in Neuroimaging Based Modalities Using Support Vector Machines with Wavelet Kernels [D] . Dalwani, Manish Shivkumar. 2017

机译：支持向量机与小波核的基于神经影像的模态机器学习
6. FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier [O] . Victor Tkachev, Maxim Sorokin, Artem Mescheryakov, 2018

机译：浮窗投影分离器（FloWPS）：一种用于支持向量机（SVM）的数据整理工具可提高分类器的鲁棒性
7. Comparison of fuzzy robust Kernel C-Means and support vector machines for intrusion detection systems using modified kernel nearest neighbor feature selection [O] . Z. Rustam, N. Olivera 2018

机译：使用修改的内核最近邻特征选择的模糊鲁棒内核C型和支持向量机的入侵检测系统的比较

A novel robust kernel for classifying high-dimensional data using Support Vector Machines

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅