Several alternative term weighting methods for text representation and classification

Tang Zhong; Li Wenqiang; Li Yan; Zhao Wu; Li Song

首页> 外文期刊>Knowledge-Based Systems >Several alternative term weighting methods for text representation and classification

【24h】

Several alternative term weighting methods for text representation and classification

机译：文本表示和分类的几种替代术语加权方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text representation is one kind of hot topics which support text classification (TC) tasks. It has a substantial impact on the performance of TC. Although the most famous TF-IDF is specially designed for information retrieval rather than TC tasks, it is highly useful in the field of TC as a term weighting method to represent text contents. Inspired by the IDF part of TF-IDF which is defined as the logarithmic transformation, we proposed several alternative methods in this study to generate unsupervised term weighting schemes that can offset the drawback confronting TF-IDF. Moreover, owing to TC tasks are different from information retrieval, representing test texts as a vector in an appropriate way is also essential for TC tasks, especially for supervised term weighting approaches (e.g., TF-RF), mainly due to these methods need to use category information when weighting the terms. But most of current schemes do not clearly explain how to represent test texts with their schemes. To explore this problem and seek a reasonable solution to these schemes, we analyzed a classic unsupervised term weighting method and three typical supervised term weighting methods in depth to illustrate how to represent test texts. To investigate the effectiveness of our work, three sets of experiments are designed to compare their performance. Comparisons show that our proposed methods can indeed enhance the performance of TC, and sometimes even outperform existing supervised term weighting methods. (C) 2020 Elsevier B.V. All rights reserved.

机译：文本表示是一种支持文本分类（TC）任务的热门话题。它对TC的性能产生了重大影响。虽然最着名的TF-IDF专为信息检索而不是TC任务而设计，但它在TC的字段中非常有用，作为代表文本内容的术语加权方法。灵感来自TF-IDF的IDF部分，该ID-IDF被定义为对数转换，我们提出了几种替代方法在本研究中，以生成无监督的术语加权方案，可以抵消TF-IDF面对的缺点。此外，由于TC任务与信息检索不同，以适当的方式表示作为向量的测试文本对于TC任务也是必不可少的，特别是对于监督术语加权方法（例如，TF-RF），主要是由于这些方法需要在加权术语时使用类别信息。但大多数当前方案没有明确解释如何用他们的方案代表测试文本。为了探索这个问题并寻求合理的解决方案，我们分析了一个经典无监督的术语加权方法和深度深度典型的监督术语加权方法，以说明如何代表测试文本。为了调查我们工作的有效性，旨在比较他们的表现。比较表明，我们的建议方法确实可以提高TC的性能，有时甚至越优于现有的监督术语加权方法。（c）2020 Elsevier B.v.保留所有权利。

著录项

来源
《Knowledge-Based Systems》 |2020年第5期|106399.1-106399.14|共14页
作者
Tang Zhong; Li Wenqiang; Li Yan; Zhao Wu; Li Song;
展开▼
作者单位

Sichuan Univ Sch Mech Engn 24 South Sect 1 Chengdu 610065 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Unsupervised term weighting; Supervised term weighting; Text representation; Text classification; Nonlinear transformation;

机译：无监督的术语加权;监督术语加权;文本表示;文本分类;非线性转换;

相似文献

外文文献
中文文献
专利

1. Combining supervised term-weighting metrics for SVM text classification with extended term representation [J] . Haddoud Mounia, Mokhtari Aicha, Lecroq Thierry, Knowledge and information systems . 2016,第3期

机译：将用于SVM文本分类的监督术语权重度量与扩展术语表示相结合
2. Modified TF-Assoc Term Weighting Method for Text Classification on News Dataset from Twitter [J] . Imroatul Khuluqi Izzah, Abba Suganda Girsang IAENG Internaitonal journal of computer science . 2021,第1Pta2期

机译：Twitter新闻数据集文本分类的修改后的TF-assoce术语加权方法
3. Frequency Based Modified Term Weighting Method for Text Classification [J] . M. Santhanakumar, C. Christopher Columbus, K. Jayapriya Asian Journal of Information Technology . 2016,第18期

机译：基于频率的文本分类术语修正权重方法
4. An improved method of term weighting for text classification [C] . Hua Jiang, Ping Li, Xin Hu, IEEE International Conference on Intelligent Computing and Intelligent Systems;ICIS 2009 . 2009

机译：文本分类中术语加权的一种改进方法
5. Structural information based term weighting in text retrieval for feature location [D] . Bassett, Richard B. 2013

机译：基于结构信息的术语权重在文本检索中进行特征定位
6. Classification of Biomedical Texts for Cardiovascular Diseases with Deep Neural Network Using a Weighted Feature Representation Method [O] . Nizar Ahmed, Fatih Dilmaç, Adil Alpkocak 2020

机译：使用加权特征表示方法对深神经网络的生物医学文本的分类
7. Comparative Study and Analysis of Supervised and Unsupervised Term Weighting Methods on Text Classification [O] . Mahak Motwani, Asst Prof Tieit, Aruna Tiwari 2014

机译：文本分类中有监督和无监督期限加权方法的比较研究与分析

Several alternative term weighting methods for text representation and classification

摘要

著录项

相似文献

相关主题

期刊订阅