A novel robust kernel for classifying high-dimensional data using Support Vector Machines

Hussain Syed Fawad

首页> 外文期刊>Expert systems with applications >A novel robust kernel for classifying high-dimensional data using Support Vector Machines

【24h】

A novel robust kernel for classifying high-dimensional data using Support Vector Machines

机译：用于使用支持向量机进行分类高维数据的新型鲁棒内核

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper presents a new semantic kernel for classification of high-dimensional data in the framework of Support Vector Machines (SVM). SVMs have gained widespread application due to their relatively higher accuracy. The efficacy of SVMs, however, depends upon the separation of the data itself as well as the kernel function. Text data, for instance, is difficult to classify due to synonymy and polysemy in its contents, having multi-topical instances that can result in mislabeling, and being highly sparse in the bag-of-words representation. While the soft margin parameter and kernel tricks are used in SVM to deal with outliers and non-linearly separable data, using data statistics and correlation has not been fully explored in the literature. This paper explore the use co-similarity (i.e., soft co-clustering) to find latent relationships between documents motivated by the success of co-clustering and subspace clustering methods. It has been shown that the use of weighted higher-order paths between instances in the data can be a good measure of similarity values which can then be used for both classification and to correct mislabeled (or outlier) data in the training set. The proposed kernel is generic in nature and suitable for sparse, dyadic data where direct co-occurrences are not necessary common as in the case of textual data, link-analysis in social media networks, co-authorship, etc. It also studies the impact of noise in the training data and provides a technique to re-label such instances. It is also observed that re-labelling of selected training data reduces the adverse effect of outliers or label noise and can greatly improve the classification of the test data. To the best of our knowledge, we are the first to introduce a supervised co-similarity based kernel function and also provide mathematical formulation to show that it is a valid Mercer's kernel. Our experiments show that the proposed framework outperforms current and state-of-the-art methods in terms of classification accuracy and is more resilient to label noise. (C) 2019 Elsevier Ltd. All rights reserved.

机译：本文介绍了一个新的语义内核，用于在支持向量机（SVM）框架中进行高维数据的分类。由于它们的准确性相对较高，SVM已经获得了广泛的应用。然而，SVM的功效取决于数据本身的分离以及内核功能。例如，由于其内容中的同义词和多义密度，文本数据很难分类，具有可能导致错误标记的多主题实例，并在文字袋式表示中稀疏。虽然SVM用于SVM的软保证金参数和内核技巧以处理异常值和非线性可分离的数据，但在文献中没有完全探索数据统计信息和相关性。本文探索了使用共同相似性（即软共聚类）来查找由共聚类和子空间聚类方法的成功激励的文档之间的潜在关系。已经表明，使用数据中的实例之间的加权高阶路径可以是一个很好的相似性值，然后可以用于分类并在训练集中纠正错误标记的（或异常值）数据。所提出的内核本质上是通用的，适用于稀疏，二进制数据，其中直接共同发生在文本数据的情况下，社交媒体网络，共同作者等的链接分析。它还研究了影响训练数据中的噪声并提供一种重新标记此类实例的技术。还观察到，重新标记所选培训数据可降低异常值或标签噪声的不利影响，并且可以大大改善测试数据的分类。据我们所知，我们是第一个引入监督的共同相似性基于内核功能的函数，也提供数学制定，以表明它是一个有效的Mercer的内核。我们的实验表明，该框架在分类精度方面优于当前的电流和最先进的方法，并且更具弹性标记噪声。（c）2019 Elsevier Ltd.保留所有权利。

著录项

来源
《Expert systems with applications》 |2019年第10期|116-131|共16页
作者
Hussain Syed Fawad;
展开▼
作者单位

Ghulam Ishaq Khan Inst Engn Sci & Technol Machine Learning & Data Sci MDS Lab Fac Comp Sci & Engn Topi Pakistan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Semantic kernels; Support Vector Machines; Co-clustering; Label noise;

机译：语义核;支持向量机;共聚类;标签噪声;

相似文献

外文文献
中文文献
专利

1. A novel robust kernel for classifying high-dimensional data using Support Vector Machines [J] . Hussain Syed Fawad Expert Systems with Application . 2019,第OCTa期

机译：使用支持向量机对高维数据进行分类的新型鲁棒内核
2. New Variable Selection Method Using Interval Segmentation Purity with Application to Blockwise Kernel Transform Support Vector Machine Classification of High-Dimensional Microarray Data [J] . Tang Li-Juan, Du Wen, Fu Hai-Yan, Journal of chemical information and modeling . 2009,第8期

机译：区间分割纯度的变量选择新方法在高维微阵列数据分块核变换支持向量机分类中的应用
3. Robust support vector machine for high-dimensional imbalanced data [J] . Nakayama Yugo Communications in Statistics . 2021,第5a6期

机译：高维不平衡数据的强大支持向量机
4. Kernel Independent Component Analysis-Based Prediction on the Protein O-Glycosylation Sites Using Support Vectors Machine and Ensemble Classifiers [C] . Zehao Chen International conference on advanced intelligent computing theories and applications . 2015

机译：使用支持向量机和集成分类器对蛋白质O-糖基化位点进行基于核独立成分分析的预测
5. Machine Learning in Neuroimaging Based Modalities Using Support Vector Machines with Wavelet Kernels [D] . Dalwani, Manish Shivkumar. 2017

机译：支持向量机与小波核的基于神经影像的模态机器学习
6. FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier [O] . Victor Tkachev, Maxim Sorokin, Artem Mescheryakov, 2018

机译：浮窗投影分离器（FloWPS）：一种用于支持向量机（SVM）的数据整理工具可提高分类器的鲁棒性
7. Comparison of fuzzy robust Kernel C-Means and support vector machines for intrusion detection systems using modified kernel nearest neighbor feature selection [O] . Z. Rustam, N. Olivera 2018

机译：使用修改的内核最近邻特征选择的模糊鲁棒内核C型和支持向量机的入侵检测系统的比较

A novel robust kernel for classifying high-dimensional data using Support Vector Machines

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅