首页> 外文期刊>Journal of information and computational science >A Centroid Based Text Categorization Method Using Mean Shift
【24h】

A Centroid Based Text Categorization Method Using Mean Shift

机译:基于均值平移的基于质心的文本分类方法

获取原文
获取原文并翻译 | 示例

摘要

Text categorization is an important research topic in Information Retrieval area and it is one of the key techniques for handling and organizing the huge amount of text data available on the Internet and other digital format in our daily life. In this paper, we propose a method for text categorization based on Mean Shift. Mean Shift algorithm is a well developed technique in computer vision researches. We extend the application of Mean Shift to text categorization by reducing the dimensions of text vector space to a proper scale. Firstly, a low-dimensional feature space is constructed using the feature selection method by the theory of information gain. Secondly, an adaptive Mean Shift algorithm is applied for detecting the centers (centroids) of each category on the feature space above. Finally, each document will be added to its most similar category by calculating the similarities between the document and the center of every category. Experimental results on 20NewsGroup and Rueters-21578 corpus show that this method can achieve higher performance than some classic text categorization method like KNN, Naive Bayes and SVM.
机译:文本分类是信息检索领域的重要研究课题,是处理和组织互联网和其他数字格式的日常生活中大量文本数据的关键技术之一。本文提出一种基于均值漂移的文本分类方法。均值漂移算法是计算机视觉研究中的一种成熟技术。通过将文本向量空间的尺寸减小到适当的比例,我们将均值平移的应用扩展到文本分类。首先,根据信息增益理论,采用特征选择方法构造低维特征空间。其次,应用自适应均值漂移算法来检测上方特征空间上每个类别的中心(质心)。最后,通过计算文档与每个类别中心之间的相似度,将每个文档添加到其最相似的类别。在20NewsGroup和Rueters-21578语料库上的实验结果表明,该方法比KNN,Naive Bayes和SVM等经典文本分类方法具有更高的性能。

著录项

  • 来源
    《Journal of information and computational science》 |2013年第14期|4703-4711|共9页
  • 作者单位

    School of Computer Science and Technology, Beihang University, Beijing 100191, China,Research Institute of Beihang in Shenzhen, Shenzhen 518000, China;

    School of Computer Science and Technology, Beihang University, Beijing 100191, China,Research Institute of Beihang in Shenzhen, Shenzhen 518000, China;

    School of Computer Science and Technology, Beihang University, Beijing 100191, China,Research Institute of Beihang in Shenzhen, Shenzhen 518000, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Text Categorization; Mean Shift; Information Gain;

    机译:文本分类;平均移动信息增益;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号