首页> 外文会议>IEEE International Conference on Big Data >Exploiting Knowledge Graph to Improve Text-based Prediction
【24h】

Exploiting Knowledge Graph to Improve Text-based Prediction

机译:利用知识图来改进基于文本的预测

获取原文

摘要

As a special kind of "big data," text data can be regarded as data reported by human sensors. Since humans are far more intelligent than physical sensors, text data contains useful information and knowledge about the real world, making it possible to make predictions about real-world phenomena based on text. As all application domains involve humans, text-based prediction has widespread applications, especially for optimization of decision making. While the problem of text-based prediction resembles text classification when formulated as a supervised learning problem, it is more challenging because the variable to be predicted may not be directly derivable from the text and thus there is a semantic gap between the target variable and the surface features that are often used for representing text data in conventional approaches. In this paper, we propose to bridge this gap by using knowledge graph to construct more effective features for text representation. We propose a two-step filtering algorithm to enhance such a knowledge-aware text representation for a family of entity-centric text regression tasks where the response variable can be treated as an attribute of a group of central entities. We evaluate the proposed algorithm by using two revenue prediction tasks based on reviews. The results show that the proposed algorithm can effectively leverage knowledge graphs to construct interpretable features, leading to significant improvement of the prediction accuracy over traditional features.
机译:作为一种特殊的“大数据”,文本数据可视为人类传感器报告的数据。由于人类比物理传感器要聪明得多,因此文本数据包含有关现实世界的有用信息和知识,从而可以基于文本对现实世界的现象进行预测。由于所有应用领域都涉及人类,因此基于文本的预测具有广泛的应用,尤其是在决策优化方面。虽然基于文本的预测问题在公式化为监督学习问题时类似于文本分类,但更具挑战性,因为要预测的变量可能无法直接从文本派生,因此目标变量和目标变量之间存在语义鸿沟在传统方法中通常用于表示文本数据的表面特征。在本文中,我们建议通过使用知识图构建更有效的文本表示功能来弥合这种差距。我们提出了两步过滤算法,以增强这种以知识为中心的文本表示形式,以一系列以实体为中心的文本回归任务,其中响应变量可以视为一组中央实体的属性。我们通过使用基于评论的两个收入预测任务来评估提出的算法。结果表明,该算法可以有效地利用知识图谱来构造可解释的特征,从而大大提高了预测精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号