...
首页> 外文期刊>Journal of the American Society for Information Science and Technology >A New Term-Weighting Scheme for Text Classification Using the Odds of Positive and Negative Class Probabilities
【24h】

A New Term-Weighting Scheme for Text Classification Using the Odds of Positive and Negative Class Probabilities

机译:一种使用正负类概率几率进行文本分类的新术语加权方案

获取原文
获取原文并翻译 | 示例
           

摘要

Text classification (TC) is a core technique for text mining and information retrieval. It has been applied to many applications in many different research and industrial areas. Term-weighting schemes assign an appropriate weight to each term to obtain a high TC performance. Although term weighting is one of the important modules for TC and TC has different peculiarities from those in information retrieval, many term-weighting schemes used in information retrieval, such as term frequency-inverse document frequency (tf-idf), have been used in TC in the same manner. The peculiarity of TC that differs most from information retrieval is the existence of class information. This article proposes a new term-weighting scheme that uses class information using positive and negative class distributions. As a result, the proposed scheme, log tf-TRR, consistently performs better than do other schemes using class information as well as traditional schemes such as tf-idf.
机译:文本分类(TC)是用于文本挖掘和信息检索的一项核心技术。它已应用于许多不同研究和工业领域的许多应用。术语加权方案为每个术语分配适当的权重以获得较高的TC性能。尽管术语权重是TC的重要模块之一,并且TC具有与信息检索中不同的功能,但在信息检索中使用了许多术语权重方案,例如术语频率反文档频率(tf-idf), TC以相同的方式。 TC与信息检索的最大不同之处在于类信息的存在。本文提出了一种新的术语加权方案,该方案使用使用正类和负类分布的类信息。结果,提出的方案log tf-TRR始终比使用类信息的其他方案以及传统方案(如tf-idf)表现更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号