...
首页> 外文期刊>Expert Systems with Application >A novel robust kernel for classifying high-dimensional data using Support Vector Machines
【24h】

A novel robust kernel for classifying high-dimensional data using Support Vector Machines

机译:使用支持向量机对高维数据进行分类的新型鲁棒内核

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a new semantic kernel for classification of high-dimensional data in the framework of Support Vector Machines (SVM). SVMs have gained widespread application due to their relatively higher accuracy. The efficacy of SVMs, however, depends upon the separation of the data itself as well as the kernel function. Text data, for instance, is difficult to classify due to synonymy and polysemy in its contents, having multi-topical instances that can result in mislabeling, and being highly sparse in the bag-of-words representation. While the soft margin parameter and kernel tricks are used in SVM to deal with outliers and non-linearly separable data, using data statistics and correlation has not been fully explored in the literature. This paper explore the use co-similarity (i.e., soft co-clustering) to find latent relationships between documents motivated by the success of co-clustering and subspace clustering methods. It has been shown that the use of weighted higher-order paths between instances in the data can be a good measure of similarity values which can then be used for both classification and to correct mislabeled (or outlier) data in the training set. The proposed kernel is generic in nature and suitable for sparse, dyadic data where direct co-occurrences are not necessary common as in the case of textual data, link-analysis in social media networks, co-authorship, etc. It also studies the impact of noise in the training data and provides a technique to re-label such instances. It is also observed that re-labelling of selected training data reduces the adverse effect of outliers or label noise and can greatly improve the classification of the test data. To the best of our knowledge, we are the first to introduce a supervised co-similarity based kernel function and also provide mathematical formulation to show that it is a valid Mercer's kernel. Our experiments show that the proposed framework outperforms current and state-of-the-art methods in terms of classification accuracy and is more resilient to label noise. (C) 2019 Elsevier Ltd. All rights reserved.
机译:本文提出了一种在支持向量机(SVM)框架中用于高维数据分类的新语义内核。 SVM由于其相对较高的精度而获得了广泛的应用。但是,SVM的功效取决于数据本身的分离以及内核功能。例如,文本数据由于其内容中的同义词和多义性而难以分类,具有可能导致标签错误的多主题实例,并且在词袋表示中非常稀疏。尽管在SVM中使用软裕度参数和内核技巧来处理离群值和非线性可分离数据,但是在文献中尚未充分探讨使用数据统计和相关性。本文探讨了使用共相似性(即软共聚)来发现由于共聚和子空间聚类方法的成功而导致的文档之间的潜在关系。已经表明,在数据中的实例之间使用加权的高阶路径可以很好地度量相似度值,然后可以将其用于分类并纠正训练集中错误标记(或异常值)的数据。拟议的内核本质上是通用的,适用于稀疏,二元数据,在这些数据中不需要直接共现,例如文本数据,社交媒体网络中的链接分析,共同作者等情况。它还研究了影响训练数据中的噪声,并提供了一种重新标记此类实例的技术。还观察到,对选定训练数据进行重新标记可以减少异常值或标记噪声的不利影响,并且可以大大改善测试数据的分类。据我们所知,我们是第一个引入基于监督的基于相似度的核函数,并且还提供了数学公式来表明它是有效的Mercer核。我们的实验表明,提出的框架在分类准确度方面胜过当前和最新的方法,并且对标签噪声的适应能力更强。 (C)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号