首页> 外文期刊>Fundamenta Informaticae >Improving Short Text Classification using Information from DBpedia Ontology
【24h】

Improving Short Text Classification using Information from DBpedia Ontology

机译:使用DBPedia本体中的信息改进短文本分类

获取原文
获取原文并翻译 | 示例
           

摘要

With the emergence of social networks and micro-blogs, a huge amount of short textual documents are generated on a daily basis, for which effective tools for organization and classification are needed. These short text documents have extremely sparse representation, which is the main cause for the poor classification performance. We propose a new approach, where we identify relevant concepts in short text documents with the use of the DBpedia Spotlight framework and enrich the text with information derived from DBpedia ontology, which reduces the sparseness. We have developed six variants of text enrichment methods and tested them on four short text datasets using seven classification algorithms. The obtained results were compared to those of the baseline approach, among themselves, and also to some state-of-the-art text classification methods. Beside classification performance, the influence of the concepts similarity threshold and the size of the training data were also evaluated. The results show that the proposed text enrichment approach significantly improves classification of short texts and is robust with respect to different input sources, domains, and sizes of available training data. The proposed text enrichment methods proved to be beneficial for classification of short text documents, especially when only a small amount of documents are available for training.
机译:随着社交网络和微博的出现,每天都会生成大量的短文本文档,所以需要进行哪些有效的组织和分类工具。这些简短的文本文件具有极其稀疏的表示,这是分类性能差的主要原因。我们提出了一种新方法,在那里我们在使用DBPedia Spotlight框架中识别简短的文本文件中的相关概念,并通过从DBPedia本体的信息丰富文本,这减少了稀疏性。我们开发了六种文本丰富方法的变体,并使用七个分类算法在四个短文本数据集中测试它们。将获得的结果与基线方法的结果进行比较,其中包括一些最先进的文本分类方法。除了分类性能之外,还评估了概念相似性阈值的影响和训练数据的大小。结果表明,拟议的文本丰富方法显着提高了短文本的分类,并对可用培训数据的不同输入来源,域和大小进行了强大。拟议的案文浓缩方法证明是有利于短文本文件的分类,特别是只有少量文件可供培训。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号