Ranking and Selecting Terms for Text Categorization via SVM Discriminate Boundary

Tien-Fang Kuo; Yasutoshi Yajima

首页> 外文期刊>International journal of entelligent systems >Ranking and Selecting Terms for Text Categorization via SVM Discriminate Boundary

【24h】

Ranking and Selecting Terms for Text Categorization via SVM Discriminate Boundary

机译：通过SVM区分边界对文本分类进行排名和选择术语

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The problem of natural language document categorization consists of classifying documents into predetermined categories based on their contents. Each distinct term, or word, in the documents is a feature for representing a document. In general, the number of terms may be extremely large and the dozens of redundant terms may be included, which may reduce the classification performance. In this paper, a support vector machine (SVM)-based feature ranking and selecting method for text categorization is proposed. The contribution of each term for classification is calculated based on the nonlinear discriminant boundary, which is generated by the SVM. The results of experiments on several real-world data sets show that the proposed method is powerful enough to extract a smaller number of important terms and achieves a higher classification performance than existing feature selecting methods based on latent semantic indexing and χ~2 statistics values.

机译：自然语言文档分类的问题包括根据文档的内容将文档分类为预定的类别。文档中的每个不同术语或单词都是代表文档的功能。通常，术语的数量可能非常大，并且可能包含数十个冗余术语，这可能会降低分类性能。提出了一种基于支持向量机的文本分类特征排序与选择方法。基于SVM生成的非线性判别边界，计算每个分类项的贡献。在多个实际数据集上的实验结果表明，与现有的基于潜在语义索引和χ〜2统计值的特征选择方法相比，该方法具有足够强大的功能来提取少量重要术语，并且具有更高的分类性能。

著录项

来源
《International journal of entelligent systems》 |2010年第2期|137-154|共18页
作者
Tien-Fang Kuo; Yasutoshi Yajima;
展开▼
作者单位

Department of Industrial Engineering and Management, Tokyo Institute of Technology, Tokyo 145-0062, Japan;

Department of Industrial Engineering and Management, Tokyo Institute of Technology, Tokyo 145-0062, Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Two Step POS Selection for SVM Based Text Categorization [J] . Takeshi MASUYAMA, Hiroshi NAKAGAWA IEICE Transactions on Information and Systems . 2004,第2期

机译：基于SVM的文本分类的两步POS选择
2. Feature Selection in SVM Text Categorization [J] . HIROTOSHI TAIRA, MASAHIKO HARUNO 情報処理学会論文誌 . 2000,第期

机译：SVM文本分类中的功能选择
3. A discriminative and semantic feature selection method for text categorization [J] . Zong Wei, Wu Feng, Chu Lap-Keung, International journal of production economics . 2015,第jula期

机译：一种用于文本分类的判别语义特征选择方法
4. Ranking and selecting terms for text categorization via SVM discriminate boundary [C] . Tien-Fang Kuo, Yajima Y. Granular Computing, 2005 IEEE International Conference on . 2005

机译：通过SVM排序和选择用于文本分类的术语以区分边界
5. An SVM ranking approach to stress assignment [D] . Dou, Qing 2009

机译：支持向量机排序的压力分配方法
6. Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles [O] . Seunghee Kim, Jinwook Choi 2012

机译：提高用于选择高质量文章的文本分类模型的性能
7. Meaningful term extraction and discriminative term selection in text categorization via unknown-word methodology [O] . Yu-sheng Lai, Chung-hsien Wu 2002

机译：未知词方法在文本分类中有意义的术语提取和歧视性术语选择

Ranking and Selecting Terms for Text Categorization via SVM Discriminate Boundary

摘要

著录项

相似文献

相关主题

期刊订阅