Semantic text classification: A survey of past and recent advances

Altinel Berna; Ganiz Murat Can

首页> 外文期刊>Information Processing & Management >Semantic text classification: A survey of past and recent advances

【24h】

Semantic text classification: A survey of past and recent advances

机译：语义文本分类：对过去和最近的进展的调查

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic text classification is the task of organizing documents into pre-determined classes, generally using machine learning algorithms. Generally speaking, it is one of the most important methods to organize and make use of the gigantic amounts of information that exist in unstructured textual format. Text classification is a widely studied research area of language processing and text mining. In traditional text classification, a document is represented as a bag of words where the words in other words terms are cut from their finer context i.e. their location in a sentence or in a document. Only the broader context of document is used with some type of term frequency information in the vector space. Consequently, semantics of words that can be inferred from the finer context of its location in a sentence and its relations with neighboring words are usually ignored. However, meaning of words, semantic connections between words, documents and even classes are obviously important since methods that capture semantics generally reach better classification performances. Several surveys have been published to analyze diverse approaches for the traditional text classification methods. Most of these surveys cover application of different semantic term relatedness methods in text classification up to a certain degree. However, they do not specifically target semantic text classification algorithms and their advantages over the traditional text classification. In order to fill this gap, we undertake a comprehensive discussion of semantic text classification vs. traditional text classification. This survey explores the past and recent advancements in semantic text classification and attempts to organize existing approaches under five fundamental categories; domain knowledge-based approaches, corpus-based approaches, deep learning based approaches, word/character sequence enhanced approaches and linguistic enriched approaches. Furthermore, this survey highlights the advantages of semantic text classification algorithms over the traditional text classification algorithms.

机译：自动文本分类是通常使用机器学习算法将文档组织到预定类中的任务。通常，这是组织和利用以非结构化文本格式存在的大量信息的最重要方法之一。文本分类是语言处理和文本挖掘的广泛研究领域。在传统的文本分类中，文档被表示为一袋单词，其中这些单词换句话说是从其更好的上下文中切出的，即它们在句子或文档中的位置。向量空间中仅将文档的更广泛上下文与某种类型的术语频率信息一起使用。因此，通常可以忽略可从其在句子中的位置及其与相邻单词的关系的更好上下文中推断出的单词的语义。但是，单词的含义，单词，文档甚至类之间的语义联系显然很重要，因为捕获语义的方法通常可以达到更好的分类性能。已经发表了一些调查，以分析传统文本分类方法的各种方法。这些调查大多数在一定程度上涵盖了不同语义术语相关性方法在文本分类中的应用。但是，它们没有专门针对语义文本分类算法及其相对于传统文本分类的优势。为了填补这一空白，我们对语义文本分类与传统文本分类进行了全面的讨论。这项调查探讨了语义文本分类的过去和最近的进展，并尝试将现有方法归纳为五个基本类别；基于领域知识的方法，基于语料库的方法，基于深度学习的方法，单词/字符序列增强的方法和语言丰富的方法。此外，本调查突出显示了语义文本分类算法相对于传统文本分类算法的优势。

著录项

来源
《Information Processing & Management》 |2018年第6期|1129-1153|共25页
作者
Altinel Berna; Ganiz Murat Can;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
入库时间 2022-08-18 04:10:57

相似文献

外文文献
中文文献
专利

1. Learning Semantic Text Features for Web Text-Aided Image Classification [J] . Wang Dongzhe, Mao Kezhi IEEE transactions on multimedia . 2019,第12期

机译：学习语义文本特征以进行Web文本辅助图像分类
2. Recent advances in extracting and processing rich semantics from medical texts [J] . Denecke Kerstin, van Harmelen Frank Artificial intelligence in medicine . 2019,第JANa期

机译：从医学文本中提取和处理丰富语义的最新进展
3. Text classification with semantically enriched word embeddings [J] . N. Pittaras, G. Giannakopoulos, G. Papadakis, Natural language engineering . 2021,第Pta4期

机译：用语义丰富的单词嵌入文本分类
4. An Effective TF/IDF-Based Text-to-Text Semantic Similarity Measure for Text Classification [C] . Shereen Albitar, Sebastien Fournier, Bernard Espinasse International conference on web information systems engineering . 2014

机译：一种有效的基于TF / IDF的文本到文本语义相似度度量用于文本分类
5. Automatic dialect classification: Advances for read and spontaneous speech, and printed text. [D] . Huang, Rongqing. 2006

机译：自动方言分类：用于阅读和自发语音以及印刷文本的改进。
6. Text Semantic Classification of Long Discourses Based on Neural Networks with Improved Focal Loss [O] . Dan Jiang, Jin He 2021

机译：基于神经网络的神经网络文本语义分类改善焦损
7. Text Classification and Text Analysis in Advances Translation Teaching [O] . Emery, Peter G. 1991

机译：高级翻译教学中的文本分类与文本分析

Semantic text classification: A survey of past and recent advances

摘要

著录项

相似文献

相关主题

期刊订阅