首页> 外文期刊>Procedia Computer Science >A Novel Term_Class Relevance Measure for Text Categorization
【24h】

A Novel Term_Class Relevance Measure for Text Categorization

机译:一种用于文本分类的新型Term_Class相关性度量

获取原文
           

摘要

In this paper, we introduce a new measure called Term_Class relevance to compute the relevancy of a term in classifying a document into a particular class. The proposed measure estimates the degree of relevance of a given term, in placing an unlabeled document to be a member of a known class, as a product of Class_Term weight and Class_Term density; where the Class_Term weight is the ratio of the number of documents of the class containing the term to the total number of documents containing the term and the Class_Term density is the relative density of occurrence of the term in the class to the total occurrence of the term in the entire population. Unlike the other existing term weighting schemes such as TF-IDF and its variants, the proposed relevance measure takes into account the degree of relative participation of the term across all documents of the class to the entire population. To demonstrate the significance of the proposed measure experimentation has been conducted on the 20 Newsgroups dataset. Further, the superiority of the novel measure is brought out through a comparative analysis.
机译:在本文中,我们引入了一种称为Term_Class关联性的新度量,以计算将文档分类为特定类时术语的关联性。拟议的措施是,将未贴标签的文档放置为已知类别的成员,以Class_Term权重和Class_Term密度的乘积估算给定术语的相关程度;其中Class_Term权重是包含该术语的类别的文档数与包含该术语的文档总数之比,​​并且Class_Term密度是该术语在该类别中的出现相对于该术语的总出现的相对密度在整个人口中。与其他现有的术语加权方案(例如TF-IDF及其变体)不同,拟议的相关性度量考虑了该术语在全班所有文档中相对参与程度。为了证明所提出的措施的重要性,已在20个新闻组数据集中进行了试验。此外,通过比较分析,该新方法的优越性得以体现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号