首页> 外文期刊>Information retrieval >Beyond word embeddings: learning entity and concept representations from large scale knowledge bases
【24h】

Beyond word embeddings: learning entity and concept representations from large scale knowledge bases

机译:除了Word Embeddings:来自大规模知识库的学习实体和概念表示

获取原文
获取原文并翻译 | 示例
       

摘要

Text representations using neural word embeddings have proven effective in many NLP applications. Recent researches adapt the traditional word embedding models to learn vectors of multiword expressions (concepts/entities). However, these methods are limited to textual knowledge bases (e.g., Wikipedia). In this paper, we propose a novel and simple technique for integrating the knowledge about concepts from two large scale knowledge bases of different structure (Wikipedia and Probase) in order to learn concept representations. We adapt the efficient skip-gram model to seamlessly learn from the knowledge in Wikipedia text and Probase concept graph. We evaluate our concept embedding models on two tasks: (1) analogical reasoning, where we achieve a state-of-the-art performance of 91% on semantic analogies, (2) concept categorization, where we achieve a state-of-the-art performance on two benchmark datasets achieving categorization accuracy of 100% on one and 98% on the other. Additionally, we present a case study to evaluate our model on unsupervised argument type identification for neural semantic parsing. We demonstrate the competitive accuracy of our unsupervised method and its ability to better generalize to out of vocabulary entity mentions compared to the tedious and error prone methods which depend on gazetteers and regular expressions.
机译:在许多NLP应用程序中证明了使用神经单词嵌入的文本表示。最近的研究适应传统的单词嵌入模型,以了解多字词表达式的向量(概念/实体)。然而,这些方法仅限于文本知识库(例如,维基百科)。在本文中,我们提出了一种新颖简单的技术,用于将关于不同结构(维基百科和探测器)的两个大规模知识库的概念的知识为基础,以便学习概念表示。我们适应高效的Skip-Gram模型,无缝地从维基百科文本和衰减概念图中无缝学习。我们在两项任务中评估我们的概念嵌入模型:(1)模拟推理,我们在语义类比上实现了91%的最先进的性能,(2)概念分类,在那里我们实现了一种状态-art在两个基准数据集上的性能,在另一个基准数据集中实现了100%和98%的分类精度。此外,我们展示了一个案例研究,以评估我们对神经语义解析的无监督参数类型识别的模型。我们展示了我们无监督的方法的竞争准确性及其更好地推出词汇实体提到的能力,与依赖于公鸡和正则表达式的繁琐和错误的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号