首页> 外文期刊>IEICE transactions on information and systems >A Framework of Centroid-Based Methods for Text Categorization
【24h】

A Framework of Centroid-Based Methods for Text Categorization

机译:A Framework of Centroid-Based Methods for Text Categorization

获取原文
获取原文并翻译 | 示例
       

摘要

Text Categorization (TC) is a task of classifying a set of documents into one or more predefined categories. Centroid-based method, a very popular TC method, aims to make classifiers simple and efficient by constructing one prototype vector for each class. It classifies a document into the class that owns the prototype vector nearest to the document. Many studies have been done on constructing prototype vectors. However, the basic philosophies of these methods are quite different from each other. It makes the comparison and selection of centroid-based TC methods very difficult. It also makes the further development of centroid-based TC methods more challenging. In this paper, based on the observation of its general procedure, the centroid-based text classification is treated as a kind of ranking task, and a unified framework for centroid-based TC methods is proposed. The goal of this unified framework is to classify a text via ranking all possible classes by document-class similarities. Prototype vectors are constructed based on various loss functions for ranking classes. Under this framework, three popular centroid-based methods: Rocchio, Hypothesis Margin Centroid and Drag Pushing are unified and their details are discussed. A novel centroid-based TC method called SLRCM that uses a smoothing ranking loss function is further proposed. Experiments conducted on several standard databases show that the proposed SLRCM method outperforms the compared centroid-based methods and reaches the same performance as the state-of-the-art TC methods.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号