Projected-prototype based classifier for text categorization

Jianfei Zhang; Lifei Chen; Gongde Guo

首页> 外文期刊>Knowledge-Based Systems >Projected-prototype based classifier for text categorization

【24h】

Projected-prototype based classifier for text categorization

机译：基于投影原型的文本分类器

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Currently, the explosive increasing of data stimulates a greater demand for text categorization. The existing prototype-based classifiers, including k-NN, kNNModel and Centroid classifier, are receiving wide interest from the text mining community because of their simplicity and efficiency. However, they usually perform less effectively on document data sets due to high dimensionality and complex class structures these sets involve. In most cases a single document category actually contains multiple subtopics, indicating that the documents in the same class may comprise multiple subclasses, each associated with its individual term subspace. In this paper, a novel projected-prototype based classifier is proposed for text categorization, in which a document category is represented by a set of prototypes, each assembling a representative for the documents in a subclass and its corresponding term subspace. In the classifier's training process, the number of prototypes and the prototypes themselves are learned using a newly developed feature-weighting algorithm, in order to ensure that the documents belonging to different subclasses are separated as much as possible when projected onto their own subspaces. Then, in the testing process, each test document is classified in terms of its weighted distances from the different prototypes. Experimental results on the Reuters-21578 and 20-Newsgroups corpora show that the proposed classifier based on the multi-representative-dependent projection method can achieve higher classification accuracy at a lower computational cost than the conventional prototype-based classifiers, especially for data sets that include overlapping document categories.

机译：当前，数据的爆炸性增长激发了对文本分类的更大需求。现有的基于原型的分类器，包括k-NN，kNNModel和Centroid分类器，由于其简单性和效率而受到文本挖掘社区的广泛关注。但是，由于它们涉及的高维和复杂的类结构，它们通常在文档数据集上的执行效率较低。在大多数情况下，单个文档类别实际上包含多个子主题，这表明同一类中的文档可能包含多个子类，每个子类都与其各自的术语子空间相关联。在本文中，提出了一种新颖的基于投影原型的分类器用于文本分类，其中文档类别由一组原型表示，每个原型在一个子类及其对应的术语子空间中组装一个代表。在分类器的训练过程中，使用新开发的特征加权算法来学习原型数量和原型本身，以确保将属于不同子类的文档投影到自己的子空间时尽可能地分开。然后，在测试过程中，根据每个测试文档与不同原型的加权距离对其进行分类。对Reuters-21578和20-Newsgroups语料库的实验结果表明，与传统的基于原型的分类器相比，基于多代表相关投影方法的分类器可以以较低的计算成本实现更高的分类精度，尤其是对于那些包括重叠的文档类别。

著录项

来源
《Knowledge-Based Systems》 |2013年第9期|179-189|共11页
作者
Jianfei Zhang; Lifei Chen; Gongde Guo;
展开▼
作者单位

School of Mathematics and Computer Science, Fujian Normal University, Fujian 350007, China;

School of Mathematics and Computer Science, Fujian Normal University, Fujian 350007, China;

School of Mathematics and Computer Science, Fujian Normal University, Fujian 350007, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Text categorization; Projection; Multi-representative; Prototype; Feature-weighting;

机译：文字分类;投影;多代表原型;特征加权;

相似文献

外文文献
中文文献
专利

1. Multiaspect Text Categorization Problem Solving: A Nearest Neighbours Classifier Based Approaches and Beyond [J] . Slawomir Zadrozny, Janusz Kacprzyk, Marek Gajewski Journal of Automation, Mobile Robotics & Intelligent Systems . 2015,第4期

机译：多方面文本分类问题的解决：基于最近邻居分类器的方法及超越
2. A generalized cluster centroid based classifier for text categorization [J] . Guansong Pang, Shengyi Jiang Information Processing & Management . 2013,第2期

机译：基于广义聚类质心的文本分类器
3. Supervised term weighting centroid-based classifiers for text categorization [J] . Tam T. Nguyen, Kuiyu Chang, Siu Cheung Hui Knowledge and information systems . 2013,第1期

机译：基于监督词权重质心的分类器，用于文本分类
4. A New Text Categorization (TC) Algorithm for Classifying Arabic Language Text Document [C] . KHALED ALHAWITI, NIDAL F. SHILBAYEH International Conference on Mathematical Methods, Computational Techniques and Intelligent Systems . 2014

机译：用于分类阿拉伯语文本文档的新文本分类（TC）算法
5. Novel application of neutrosophic logic in classifiers evaluated under region-based image categorization system. [D] . Ju, Wen. 2011

机译：中智逻辑在基于区域图像分类系统评估的分类器中的新应用。
6. The TREC 2004 genomics track categorization task: classifying full text biomedical documents [O] . Aaron M Cohen, William R Hersh 2006

机译：TREC 2004基因组学跟踪分类任务：对全文生物医学文献进行分类
7. Text Categorization Using an Ensemble Classifier Based on a Mean Co-association Matrix [O] . Luís Moreira-Matias, João Mendes-Moreira, João Gama, 2012

机译：基于平均共关联矩阵的合奏分类器的文本分类

Projected-prototype based classifier for text categorization

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅