首页> 外文期刊>Expert systems with applications >A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine
【24h】

A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine

机译:结合K近邻和支持向量机的文本依赖度低的混合文本分类方法。

获取原文
获取原文并翻译 | 示例

摘要

This work implements a new text document classifier by integrating the K-nearest neighbor (KNN) classification approach with the support vector machine (SVM) training algorithm. The proposed Nearest Neighbor-Support Vector Machine hybrid classification approach is coined as SVM-NN. The KNN has been reported as one of the widely used text classification approaches due to its simplicity and efficiency in handling various types of text classification tasks. However, there exists a major problem of the KNN in determining the appropriate value for parameter K in order to guarantee high classification effectiveness. This is due to the fact that the selection of the value of parameter K has high impact on the accuracy of the KNN classifier. Other than determining the optimal value of parameter K, the KNN is also a lazy learning method which keeps the entire training samples until classification time. Hence, the computational process of the KNN has become intensive when the value of parameter K increases. In this paper, we propose the SVM-NN hybrid classification approach with the objective that to minimize the impact of parameter on classification accuracy. In the training stage, the SVM is utilized to reduce the training samples for each of the available categories to their support vectors (SVs). The SVs from different categories are used as the training data of nearest neighbor classification algorithm in which the Euclidean distance function is used to calculate the average distance between the testing data point to each set of SVs of different categories. The classification decision is made based on the category which has the shortest average distance between its SVs and the testing data point. The experiments on several benchmark text datasets show that the classification accuracy of the SVM-NN approach has low impact on the value of parameter, as compared to the conventional KNN classification model.
机译:这项工作通过将K最近邻(KNN)分类方法与支持向量机(SVM)训练算法集成在一起,实现了一种新的文本文档分类器。提出的最近邻支持向量机混合分类方法被称为SVM-NN。由于KNN在处理各种类型的文本分类任务中的简单性和效率,已被报告为广泛使用的文本分类方法之一。然而,在确定参数K的适当值以保证高分类有效性方面,KNN存在一个主要问题。这是由于以下事实:选择参数K的值对KNN分类器的准确性有很大影响。除了确定参数K的最佳值外,KNN还是一种惰性学习方法,可以将整个训练样本保留到分类时间。因此,当参数K的值增加时,KNN的计算过程变得很密集。在本文中,我们提出了SVM-NN混合分类方法,其目的是最大程度地减少参数对分类准确性的影响。在训练阶段,使用SVM将每个可用类别的训练样本减少到其支持向量(SV)。来自不同类别的SV用作最近邻居分类算法的训练数据,其中欧几里德距离函数用于计算测试数据点到不同类别的每组SV之间的平均距离。根据其SV与测试数据点之间的平均距离最短的类别来做出分类决策。在多个基准文本数据集上的实验表明,与传统的KNN分类模型相比,SVM-NN方法的分类准确性对参数值的影响很小。

著录项

  • 来源
    《Expert systems with applications》 |2012年第15期|p.11880-11888|共9页
  • 作者单位

    Faculty of Information and Communication Technology, Universiti Tunku Abdul Rahman, 31900 Kampar, Perak, Malaysia;

    Intelligent Systems Research Croup, Faculty of Engineering, The University of Nottingham, Malaysia Campus, Jalan Broga, 43500 Semenyih, Selangor, Malaysia;

    Intelligent Systems Research Croup, Faculty of Engineering, The University of Nottingham, Malaysia Campus, Jalan Broga, 43500 Semenyih, Selangor, Malaysia;

    Intelligent Systems Research Croup, Faculty of Engineering, The University of Nottingham, Malaysia Campus, Jalan Broga, 43500 Semenyih, Selangor, Malaysia;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    text document classification; K-nearest neighbor; support vector machine; euclidean distance function;

    机译:文本文件分类;K近邻;支持向量机欧氏距离函数;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号