首页> 外文期刊>Knowledge-Based Systems >Several alternative term weighting methods for text representation and classification
【24h】

Several alternative term weighting methods for text representation and classification

机译:文本表示和分类的几种替代术语加权方法

获取原文
获取原文并翻译 | 示例
       

摘要

Text representation is one kind of hot topics which support text classification (TC) tasks. It has a substantial impact on the performance of TC. Although the most famous TF-IDF is specially designed for information retrieval rather than TC tasks, it is highly useful in the field of TC as a term weighting method to represent text contents. Inspired by the IDF part of TF-IDF which is defined as the logarithmic transformation, we proposed several alternative methods in this study to generate unsupervised term weighting schemes that can offset the drawback confronting TF-IDF. Moreover, owing to TC tasks are different from information retrieval, representing test texts as a vector in an appropriate way is also essential for TC tasks, especially for supervised term weighting approaches (e.g., TF-RF), mainly due to these methods need to use category information when weighting the terms. But most of current schemes do not clearly explain how to represent test texts with their schemes. To explore this problem and seek a reasonable solution to these schemes, we analyzed a classic unsupervised term weighting method and three typical supervised term weighting methods in depth to illustrate how to represent test texts. To investigate the effectiveness of our work, three sets of experiments are designed to compare their performance. Comparisons show that our proposed methods can indeed enhance the performance of TC, and sometimes even outperform existing supervised term weighting methods. (C) 2020 Elsevier B.V. All rights reserved.
机译:文本表示是一种支持文本分类(TC)任务的热门话题。它对TC的性能产生了重大影响。虽然最着名的TF-IDF专为信息检索而不是TC任务而设计,但它在TC的字段中非常有用,作为代表文本内容的术语加权方法。灵感来自TF-IDF的IDF部分,该ID-IDF被定义为对数转换,我们提出了几种替代方法在本研究中,以生成无监督的术语加权方案,可以抵消TF-IDF面对的缺点。此外,由于TC任务与信息检索不同,以适当的方式表示作为向量的测试文本对于TC任务也是必不可少的,特别是对于监督术语加权方法(例如,TF-RF),主要是由于这些方法需要在加权术语时使用类别信息。但大多数当前方案没有明确解释如何用他们的方案代表测试文本。为了探索这个问题并寻求合理的解决方案,我们分析了一个经典无监督的术语加权方法和深度深度典型的监督术语加权方法,以说明如何代表测试文本。为了调查我们工作的有效性,旨在比较他们的表现。比较表明,我们的建议方法确实可以提高TC的性能,有时甚至越优于现有的监督术语加权方法。 (c)2020 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号