Beyond word embeddings: learning entity and concept representations from large scale knowledge bases

Shalaby Walid; Zadrozny Wlodek; Jin Hongxia

首页> 外文期刊>Information retrieval >Beyond word embeddings: learning entity and concept representations from large scale knowledge bases

【24h】

Beyond word embeddings: learning entity and concept representations from large scale knowledge bases

机译：除了Word Embeddings：来自大规模知识库的学习实体和概念表示

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text representations using neural word embeddings have proven effective in many NLP applications. Recent researches adapt the traditional word embedding models to learn vectors of multiword expressions (concepts/entities). However, these methods are limited to textual knowledge bases (e.g., Wikipedia). In this paper, we propose a novel and simple technique for integrating the knowledge about concepts from two large scale knowledge bases of different structure (Wikipedia and Probase) in order to learn concept representations. We adapt the efficient skip-gram model to seamlessly learn from the knowledge in Wikipedia text and Probase concept graph. We evaluate our concept embedding models on two tasks: (1) analogical reasoning, where we achieve a state-of-the-art performance of 91% on semantic analogies, (2) concept categorization, where we achieve a state-of-the-art performance on two benchmark datasets achieving categorization accuracy of 100% on one and 98% on the other. Additionally, we present a case study to evaluate our model on unsupervised argument type identification for neural semantic parsing. We demonstrate the competitive accuracy of our unsupervised method and its ability to better generalize to out of vocabulary entity mentions compared to the tedious and error prone methods which depend on gazetteers and regular expressions.

机译：在许多NLP应用程序中证明了使用神经单词嵌入的文本表示。最近的研究适应传统的单词嵌入模型，以了解多字词表达式的向量（概念/实体）。然而，这些方法仅限于文本知识库（例如，维基百科）。在本文中，我们提出了一种新颖简单的技术，用于将关于不同结构（维基百科和探测器）的两个大规模知识库的概念的知识为基础，以便学习概念表示。我们适应高效的Skip-Gram模型，无缝地从维基百科文本和衰减概念图中无缝学习。我们在两项任务中评估我们的概念嵌入模型：（1）模拟推理，我们在语义类比上实现了91％的最先进的性能，（2）概念分类，在那里我们实现了一种状态-art在两个基准数据集上的性能，在另一个基准数据集中实现了100％和98％的分类精度。此外，我们展示了一个案例研究，以评估我们对神经语义解析的无监督参数类型识别的模型。我们展示了我们无监督的方法的竞争准确性及其更好地推出词汇实体提到的能力，与依赖于公鸡和正则表达式的繁琐和错误的方法。

著录项

来源
《Information retrieval》 |2019年第6期|525-542|共18页
作者
Shalaby Walid; Zadrozny Wlodek; Jin Hongxia;
展开▼
作者单位

Univ N Carolina Dept Comp Sci 9201 Univ City Blvd Charlotte NC 28223 USA;

Univ N Carolina Dept Comp Sci 9201 Univ City Blvd Charlotte NC 28223 USA;

Samsung Res Amer 665 Clyde Ave Mountain View CA 94043 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Entity and concept embeddings; Entity identification; Concept categorization; Skip-gram; Probase; Knowledge graph representations;

机译：实体和概念嵌入;实体识别;概念分类;跳过克;探针;知识图表表示;

相似文献

外文文献
中文文献
专利

1. A Novel Approach for Analyzing Entity Linking Between Words and Entities for a Knowledge Base Using an Attention-Based Bilinear Joint Learning and Weighted Summation Model [J] . Luo Shuanghu, Wang Penglong, Cao Min Quality Control, Transactions . 2020,第期

机译：一种新的方法，用于分析使用基于注意的双线性联合学习和加权求和模型的知识库词和实体之间的实体链接的方法
2. Universal Representation Learning of Knowledge Bases by Jointly Embedding Instances and Ontological Concepts [J] . Junheng Hao, Muhao Chen, Wenchao Yu, SIGKDD explorations . 2019,第Udisk期

机译：通过联合嵌入实例和本体论概念来学习知识库的普遍代表
3. Research on Pattern Representation Based on Keyword and Word Embedding in Chinese Entity Relation Extraction [J] . Feiyue Ye, Zhentao Qin Journal of Advanced Computatioanl Intelligence and Intelligent Informatics . 2018,第4a131期

机译：基于中国实体关系提取中的关键字和单词嵌入的模式表示研究
4. HEXTRATO: Using Ontology-based Constraints to Improve Accuracy on Learning Domain-specific Entity and Relationship Embedding Representation for Knowledge Resolution [C] . Hegler Tissot International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management . 2018

机译：Hextrato：使用基于本体的约束来提高学习域特定实体的准确性和知识分辨率的关系嵌入表示
5. Joint Approaches for Learning Word Representations from Text Corpora and Knowledge Bases [D] . Alsuhaibani, Mohammed. 2020

机译：从文本语料库和知识库学习词语的联合方法
6. Joint Learning of Representations of Medical Concepts and Words from EHR Data [O] . Tian Bai, Ashis Kumar Chanda, Brian L. Egleston, -1

机译：从EHR数据中共同学习医学概念和词语的表示形式
7. Beyond Word Embeddings: Learning Entity and Concept Representations from Large Scale Knowledge Bases [O] . Shalaby, Walid, Zadrozny, Wlodek, Jin, Hongxia 2017

机译：超越Word嵌入：学习实体和概念表示大规模知识库

Beyond word embeddings: learning entity and concept representations from large scale knowledge bases

摘要

著录项

相似文献

相关主题

期刊订阅