...
首页> 外文期刊>International journal of entelligent systems >Ranking and Selecting Terms for Text Categorization via SVM Discriminate Boundary
【24h】

Ranking and Selecting Terms for Text Categorization via SVM Discriminate Boundary

机译:通过SVM区分边界对文本分类进行排名和选择术语

获取原文
获取原文并翻译 | 示例
           

摘要

The problem of natural language document categorization consists of classifying documents into predetermined categories based on their contents. Each distinct term, or word, in the documents is a feature for representing a document. In general, the number of terms may be extremely large and the dozens of redundant terms may be included, which may reduce the classification performance. In this paper, a support vector machine (SVM)-based feature ranking and selecting method for text categorization is proposed. The contribution of each term for classification is calculated based on the nonlinear discriminant boundary, which is generated by the SVM. The results of experiments on several real-world data sets show that the proposed method is powerful enough to extract a smaller number of important terms and achieves a higher classification performance than existing feature selecting methods based on latent semantic indexing and χ~2 statistics values.
机译:自然语言文档分类的问题包括根据文档的内容将文档分类为预定的类别。文档中的每个不同术语或单词都是代表文档的功能。通常,术语的数量可能非常大,并且可能包含数十个冗余术语,这可能会降低分类性能。提出了一种基于支持向量机的文本分类特征排序与选择方法。基于SVM生成的非线性判别边界,计算每个分类项的贡献。在多个实际数据集上的实验结果表明,与现有的基于潜在语义索引和χ〜2统计值的特征选择方法相比,该方法具有足够强大的功能来提取少量重要术语,并且具有更高的分类性能。

著录项

  • 来源
    《International journal of entelligent systems》 |2010年第2期|137-154|共18页
  • 作者单位

    Department of Industrial Engineering and Management, Tokyo Institute of Technology, Tokyo 145-0062, Japan;

    Department of Industrial Engineering and Management, Tokyo Institute of Technology, Tokyo 145-0062, Japan;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号