首页> 外文期刊>Information Processing & Management >Semantic text classification: A survey of past and recent advances
【24h】

Semantic text classification: A survey of past and recent advances

机译:语义文本分类:对过去和最近的进展的调查

获取原文
获取原文并翻译 | 示例
       

摘要

Automatic text classification is the task of organizing documents into pre-determined classes, generally using machine learning algorithms. Generally speaking, it is one of the most important methods to organize and make use of the gigantic amounts of information that exist in unstructured textual format. Text classification is a widely studied research area of language processing and text mining. In traditional text classification, a document is represented as a bag of words where the words in other words terms are cut from their finer context i.e. their location in a sentence or in a document. Only the broader context of document is used with some type of term frequency information in the vector space. Consequently, semantics of words that can be inferred from the finer context of its location in a sentence and its relations with neighboring words are usually ignored. However, meaning of words, semantic connections between words, documents and even classes are obviously important since methods that capture semantics generally reach better classification performances. Several surveys have been published to analyze diverse approaches for the traditional text classification methods. Most of these surveys cover application of different semantic term relatedness methods in text classification up to a certain degree. However, they do not specifically target semantic text classification algorithms and their advantages over the traditional text classification. In order to fill this gap, we undertake a comprehensive discussion of semantic text classification vs. traditional text classification. This survey explores the past and recent advancements in semantic text classification and attempts to organize existing approaches under five fundamental categories; domain knowledge-based approaches, corpus-based approaches, deep learning based approaches, word/character sequence enhanced approaches and linguistic enriched approaches. Furthermore, this survey highlights the advantages of semantic text classification algorithms over the traditional text classification algorithms.
机译:自动文本分类是通常使用机器学习算法将文档组织到预定类中的任务。通常,这是组织和利用以非结构化文本格式存在的大量信息的最重要方法之一。文本分类是语言处理和文本挖掘的广泛研究领域。在传统的文本分类中,文档被表示为一袋单词,其中这些单词换句话说是从其更好的上下文中切出的,即它们在句子或文档中的位置。向量空间中仅将文档的更广泛上下文与某种类型的术语频率信息一起使用。因此,通常可以忽略可从其在句子中的位置及其与相邻单词的关系的更好上下文中推断出的单词的语义。但是,单词的含义,单词,文档甚至类之间的语义联系显然很重要,因为捕获语义的方法通常可以达到更好的分类性能。已经发表了一些调查,以分析传统文本分类方法的各种方法。这些调查大多数在一定程度上涵盖了不同语义术语相关性方法在文本分类中的应用。但是,它们没有专门针对语义文本分类算法及其相对于传统文本分类的优势。为了填补这一空白,我们对语义文本分类与传统文本分类进行了全面的讨论。这项调查探讨了语义文本分类的过去和最近的进展,并尝试将现有方法归纳为五个基本类别;基于领域知识的方法,基于语料库的方法,基于深度学习的方法,单词/字符序列增强的方法和语言丰富的方法。此外,本调查突出显示了语义文本分类算法相对于传统文本分类算法的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号