首页> 外文期刊>Expert systems with applications >A novel robust kernel for classifying high-dimensional data using Support Vector Machines
【24h】

A novel robust kernel for classifying high-dimensional data using Support Vector Machines

机译:用于使用支持向量机进行分类高维数据的新型鲁棒内核

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a new semantic kernel for classification of high-dimensional data in the framework of Support Vector Machines (SVM). SVMs have gained widespread application due to their relatively higher accuracy. The efficacy of SVMs, however, depends upon the separation of the data itself as well as the kernel function. Text data, for instance, is difficult to classify due to synonymy and polysemy in its contents, having multi-topical instances that can result in mislabeling, and being highly sparse in the bag-of-words representation. While the soft margin parameter and kernel tricks are used in SVM to deal with outliers and non-linearly separable data, using data statistics and correlation has not been fully explored in the literature. This paper explore the use co-similarity (i.e., soft co-clustering) to find latent relationships between documents motivated by the success of co-clustering and subspace clustering methods. It has been shown that the use of weighted higher-order paths between instances in the data can be a good measure of similarity values which can then be used for both classification and to correct mislabeled (or outlier) data in the training set. The proposed kernel is generic in nature and suitable for sparse, dyadic data where direct co-occurrences are not necessary common as in the case of textual data, link-analysis in social media networks, co-authorship, etc. It also studies the impact of noise in the training data and provides a technique to re-label such instances. It is also observed that re-labelling of selected training data reduces the adverse effect of outliers or label noise and can greatly improve the classification of the test data. To the best of our knowledge, we are the first to introduce a supervised co-similarity based kernel function and also provide mathematical formulation to show that it is a valid Mercer's kernel. Our experiments show that the proposed framework outperforms current and state-of-the-art methods in terms of classification accuracy and is more resilient to label noise. (C) 2019 Elsevier Ltd. All rights reserved.
机译:本文介绍了一个新的语义内核,用于在支持向量机(SVM)框架中进行高维数据的分类。由于它们的准确性相对较高,SVM已经获得了广泛的应用。然而,SVM的功效取决于数据本身的分离以及内核功能。例如,由于其内容中的同义词和多义密度,文本数据很难分类,具有可能导致错误标记的多主题实例,并在文字袋式表示中稀疏。虽然SVM用于SVM的软保证金参数和内核技巧以处理异常值和非线性可分离的数据,但在文献中没有完全探索数据统计信息和相关性。本文探索了使用共同相似性(即软共聚类)来查找由共聚类和子空间聚类方法的成功激励的文档之间的潜在关系。已经表明,使用数据中的实例之间的加权高阶路径可以是一个很好的相似性值,然后可以用于分类并在训练集中纠正错误标记的(或异常值)数据。所提出的内核本质上是通用的,适用于稀疏,二进制数据,其中直接共同发生在文本数据的情况下,社交媒体网络,共同作者等的链接分析。它还研究了影响训练数据中的噪声并提供一种重新标记此类实例的技术。还观察到,重新标记所选培训数据可降低异常值或标签噪声的不利影响,并且可以大大改善测试数据的分类。据我们所知,我们是第一个引入监督的共同相似性基于内核功能的函数,也提供数学制定,以表明它是一个有效的Mercer的内核。我们的实验表明,该框架在分类精度方面优于当前的电流和最先进的方法,并且更具弹性标记噪声。 (c)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号