Graph-based approaches to resolve entity ambiguity.

机译：解决实体歧义的基于图的方法。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Information extraction is the task of automatically extracting structured information from unstructured or semi-structured machine-readable documents. One of the challenges of Information Extraction is to resolve ambiguity between entities either in a knowledge base or in text documents. There are many variations of this problem and it is known under different names, such as coreference resolution, entity disambiguation, entity linking, entity matching, etc. For example, the task of coreference resolution decides whether two expressions refer to the same entity; entity disambiguation determines how to map an entity mention to an appropriate entity in a knowledge base (KB); the main focus of entity linking is to infer that two entity mentions in a document(s) refer to the same real world entity even if they do not appear in a KB; entity matching (also record deduplication, entity resolution, reference reconciliation) is to merge records from databases if they refer to the same object.;Resolving ambiguity and finding proper matches between entities is an important step for many downstream applications, such as data integration, question answering, relation extraction, etc. The Internet has enabled the creation of a growing number of large-scale knowledge bases in a variety of domains, posing a scalability challenge for Information Extraction systems. Tools for automatically aligning these knowledge bases would make it possible to unify many sources of structured knowledge and to answer complex queries. However the efficient alignment of large-scale knowledge bases still poses a considerable challenge.;Various aspects and different settings to resolve ambiguity between entities are studied in this dissertation. A new scalable domain-independent graph-based approach utilizing Personalized Page Rank is developed for entity matching across large-scale knowledge bases and evaluated on datasets of 110 million and 203 million entities. A new model for entity disambiguation between a document and a knowledge base utilizing a document graph and effectively filtering out noise is proposed; corresponding datasets are released. A competitive result of 91.7% in microaccuracy on a benchmark AIDA dataset is achieved, outperforming the most recent state-of-the-art models. A new technique based on a paraphrase detection model is proposed to recognize name variations for an entity in a document. Corresponding training and test datasets are made publicly available. A new approach integrating a graph-based entity disambiguation model and this technique is presented for an entity linking task and is evaluated on a dataset for the Text Analysis Conference Entity Discovery and Linking task.

机译：信息提取是从非结构化或半结构化的机器可读文档中自动提取结构化信息的任务。信息提取的挑战之一是解决知识库或文本文档中实体之间的歧义。此问题有很多变体，并且以不同的名称来了解，例如共引用解析，实体歧义消除，实体链接，实体匹配等。例如，共引用解析的任务确定两个表达式是否引用同一实体；例如，实体歧义消除确定如何将实体提及映射到知识库（KB）中的适当实体；实体链接的主要重点是推断文档中提到的两个实体是指同一真实世界实体，即使它们没有出现在KB中也是如此；实体匹配（也包括记录重复数据删除，实体解析，引用对帐）是合并数据库中的记录（如果它们引用同一对象）。解决歧义并找到实体之间的正确匹配是许多下游应用程序（例如数据集成）的重要步骤，互联网已使人们能够在各种领域中创建越来越多的大型知识库，从而给信息提取系统带来了可扩展性挑战。自动调整这些知识库的工具将使统一许多结构化知识的来源并回答复杂的查询成为可能。然而，大规模知识库的有效对齐仍然带来相当大的挑战。本文研究了解决实体之间歧义性的各种方面和不同设置。针对个性化大型知识库的实体，开发了一种新的可扩展的，与域无关的基于图的，基于个性化页面排名的方法，并在1.1亿和2.03亿个实体的数据集上进行了评估。提出了一种利用文档图有效消除噪声的文档和知识库实体消歧模型。释放相应的数据集。在基准AIDA数据集上，其微精度的竞争结果达到了91.7％，优于最新的模型。提出了一种基于释义检测模型的新技术来识别文档中实体的名称变化。相应的培训和测试数据集是公开可用的。提出了一种新的方法，该方法集成了基于图的实体消歧模型，并且针对实体链接任务提出了此技术，并针对文本分析会议实体发现和链接任务对数据集进行了评估。

著录项

作者
Pershina, Maria.;
展开▼
作者单位

New York University.;

展开▼
授予单位 New York University.;
学科 Computer science.
学位 Ph.D.
年度 2016
页码 94 p.
总页数 94
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Bees encode behaviorally significant spectral relationships in complex scenes to resolve stimulus ambiguity. [J] . Lotto RB, Wicklein M Proceedings of the National Academy of Sciences of the United States of America . 2005,第46期

机译：蜜蜂在复杂场景中编码行为上重要的光谱关系，以解决刺激性歧义。
2. N£Rank+: a graph-based approach for entity ranking in document collections [J] . Wang Chengyu, Zhou Guomin, He Xiaofeng, Frontiers of computer science in China . 2018,第3期

机译：N£Rank +：用于文档集合中实体排名的基于图的方法
3. Graph-Based Entity-Oriented Search: Imitating the Human Process of Seeking and Cross Referencing Information [J] . José Devezas, Sérgio Nunes ERCIM News . 2017,第1期

机译：基于图的面向实体的搜索：模仿人类寻找和交叉引用信息的过程
4. Graph-based Approaches for Organization Entity Resolution in MapReduce [C] . Hakan Kardes, Deepak Konidena, Siddharth Agrawal, Workshop on graph-based methods for natural language processing 2013 . 2013

机译：MapReduce中基于图的组织实体解析方法
5. Identifying and Resolving Entities in Text. [D] . Durrett, Gregory Christopher. 2016

机译：识别和解决文本中的实体。
6. Self-resolving focal non-ossifying myositis: a poorly known clinical and imaging entity diagnosed with MRI [O] . Vasiliki Perlepe, Benjamin Dallaudière, Patrick Omoumi, 2015

机译：自我解决的局灶性非骨化性肌炎：诊断为MRI的鲜为人知的临床和影像学实体
7. A causal role for V5/MT neurons coding motion-disparity conjunctions in resolving perceptual ambiguity. [O] . Krug, K, Cicmil, N, Parker, AJ, 2013

机译：V5 / MT神经元编码运动差异连接的因果作用在解决感知歧义上。

Graph-based approaches to resolve entity ambiguity.

摘要

著录项

相似文献

相关主题

期刊订阅