A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine

Chin Heng Wan; Lam Hong Lee; Rajprasad Rajkumar; Dino Isa

首页> 外文期刊>Expert systems with applications >A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine

【24h】

A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine

机译：结合K近邻和支持向量机的文本依赖度低的混合文本分类方法。

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This work implements a new text document classifier by integrating the K-nearest neighbor (KNN) classification approach with the support vector machine (SVM) training algorithm. The proposed Nearest Neighbor-Support Vector Machine hybrid classification approach is coined as SVM-NN. The KNN has been reported as one of the widely used text classification approaches due to its simplicity and efficiency in handling various types of text classification tasks. However, there exists a major problem of the KNN in determining the appropriate value for parameter K in order to guarantee high classification effectiveness. This is due to the fact that the selection of the value of parameter K has high impact on the accuracy of the KNN classifier. Other than determining the optimal value of parameter K, the KNN is also a lazy learning method which keeps the entire training samples until classification time. Hence, the computational process of the KNN has become intensive when the value of parameter K increases. In this paper, we propose the SVM-NN hybrid classification approach with the objective that to minimize the impact of parameter on classification accuracy. In the training stage, the SVM is utilized to reduce the training samples for each of the available categories to their support vectors (SVs). The SVs from different categories are used as the training data of nearest neighbor classification algorithm in which the Euclidean distance function is used to calculate the average distance between the testing data point to each set of SVs of different categories. The classification decision is made based on the category which has the shortest average distance between its SVs and the testing data point. The experiments on several benchmark text datasets show that the classification accuracy of the SVM-NN approach has low impact on the value of parameter, as compared to the conventional KNN classification model.

机译：这项工作通过将K最近邻（KNN）分类方法与支持向量机（SVM）训练算法集成在一起，实现了一种新的文本文档分类器。提出的最近邻支持向量机混合分类方法被称为SVM-NN。由于KNN在处理各种类型的文本分类任务中的简单性和效率，已被报告为广泛使用的文本分类方法之一。然而，在确定参数K的适当值以保证高分类有效性方面，KNN存在一个主要问题。这是由于以下事实：选择参数K的值对KNN分类器的准确性有很大影响。除了确定参数K的最佳值外，KNN还是一种惰性学习方法，可以将整个训练样本保留到分类时间。因此，当参数K的值增加时，KNN的计算过程变得很密集。在本文中，我们提出了SVM-NN混合分类方法，其目的是最大程度地减少参数对分类准确性的影响。在训练阶段，使用SVM将每个可用类别的训练样本减少到其支持向量（SV）。来自不同类别的SV用作最近邻居分类算法的训练数据，其中欧几里德距离函数用于计算测试数据点到不同类别的每组SV之间的平均距离。根据其SV与测试数据点之间的平均距离最短的类别来做出分类决策。在多个基准文本数据集上的实验表明，与传统的KNN分类模型相比，SVM-NN方法的分类准确性对参数值的影响很小。

著录项

来源
《Expert systems with applications》 |2012年第15期|p.11880-11888|共9页
作者
Chin Heng Wan; Lam Hong Lee; Rajprasad Rajkumar; Dino Isa;
展开▼
作者单位

Faculty of Information and Communication Technology, Universiti Tunku Abdul Rahman, 31900 Kampar, Perak, Malaysia;

Intelligent Systems Research Croup, Faculty of Engineering, The University of Nottingham, Malaysia Campus, Jalan Broga, 43500 Semenyih, Selangor, Malaysia;

Intelligent Systems Research Croup, Faculty of Engineering, The University of Nottingham, Malaysia Campus, Jalan Broga, 43500 Semenyih, Selangor, Malaysia;

Intelligent Systems Research Croup, Faculty of Engineering, The University of Nottingham, Malaysia Campus, Jalan Broga, 43500 Semenyih, Selangor, Malaysia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
text document classification; K-nearest neighbor; support vector machine; euclidean distance function;

机译：文本文件分类;K近邻;支持向量机欧氏距离函数;

相似文献

外文文献
中文文献
专利

1. A Hybrid Classification Approach Based on Support Vector Machine and K-Nearest Neighbor for Remote Sensing Data [J] . Alimjan Gulnaz, Sun Tieli, Jumahun Hurxida, International Journal of Pattern Recognition and Artificial Intelligence . 2017,第10期

机译：基于支持向量机和K最近邻的遥感数据混合分类方法
2. SECOND-ORDER STATISTICAL APPROACH FOR DIGITAL MODULATION SCHEME CLASSIFICATION IN COGNITIVE RADIO USING SUPPORT VECTOR MACHINE AND K-NEAREST NEIGHBOR CLASSIFIER [J] . Kannan R., S. Ravi Journal of computer sciences . 2013,第2期

机译：支持向量机和K-Neastrest近邻分类器在认知无线电中数字调制方案分类的二阶统计方法
3. SECOND-ORDER STATISTICAL APPROACH FOR DIGITAL MODULATION SCHEME CLASSIFICATION IN COGNITIVE RADIO USING SUPPORT VECTOR MACHINE AND K-NEAREST NEIGHBOR CLASSIFIER | Science Publications [J] . R. Kannan, S. Ravi Journal of computer sciences . 2013,第2期

机译：支持向量机和K-Neastrest近邻分类器的认知无线电数字调制方案分类的二阶统计方法科学出版物
4. A Hybrid Text Classification Method Based on K-Congener-Nearest-Neighbors and Hypersphere Support Vector Machine [C] . Chen Y.H., Zheng Y.F., Pan J.F., 2013 International Conference on Information Technology and Applications . 2013

机译：基于K-Congener-最近邻和超球支持向量机的混合文本分类方法
5. Vertical equal-interval neighborhood ring-based k-nearest neighbor/local support vector machine classification and applications. [D] . Pan, Fei. 2004

机译：基于垂直等间隔邻域环的k最近邻/局部支持向量机的分类和应用。
6. Comparison of Random Forest k-Nearest Neighbor and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery [O] . Phan Thanh Noi, Martin Kappas 2018

机译：使用Sentinel-2影像进行土地覆盖分类的随机森林k最近邻和支持向量机分类器的比较
7. SECOND-ORDER STATISTICAL APPROACH FOR DIGITAL MODULATION SCHEME CLASSIFICATION IN COGNITIVE RADIO USING SUPPORT VECTOR MACHINE AND K-NEAREST NEIGHBOR CLASSIFIER [O] . R. Kannan, S. Ravi 2013

机译：支持向量机和K近邻分类器在认知无线电中的数字调制方案分类的二阶统计方法
8. Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification. [R] . Han, E., Karypis, G., Kumar, V. 1999

机译：使用权重调整的k-最近邻分类的文本分类。

A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅