Probabilistic models for scalable knowledge graph construction.

机译：用于可扩展知识图构建的概率模型。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph.

机译：在过去的十年中，从数百万个Internet文档中提取信息的系统变得司空见惯。知识图-描述实体，实体的属性以及它们之间的关系的结构化知识库-是了解和组织大量信息的强大工具。但是，由于底层数据中的噪声和歧义性或提取系统产生的错误，以及这些噪声提取之间的依存关系推理的复杂性，导致知识图构建的重大障碍是提取信息的不可靠性。本文通过利用事实之间的相互依赖性来解决这些挑战，以提高可伸缩框架中知识图的质量。我介绍了一种称为知识图识别（KGI）的新方法，该方法通过合并来自多个来源，实体共同引用和本体约束的不确定性提取来解决知识图中的实体，属性和关系。我定义了可能的知识图的概率分布，并结合概率推理和逻辑推理来推断最可能的知识图。由于可伸缩性方面的考虑，此类概率模型经常被忽略，但我的KGI实现通过使用具有凸推断目的的铰链损耗马尔可夫随机字段来在较大问题上保持可控制的性能。这样可以在2小时内使用4M事实和20M地面约束来推断大型知识图。为了进一步扩展解决方案，我针对KGI问题开发了一种分布式方法，该方法可在多台计算机上并行运行，从而将推理时间减少了90％。最后，我将模型扩展到流设置，其中通过合并新提取的事实来不断更新知识图。我设计了一种通用方法来近似更新凸概率模型中的推论，并通过定义和界定在线模型的推论遗憾来量化近似误差。总之，我的工作保留了概率模型的吸引力，同时提供了大规模知识图构建所需的可伸缩性。这些模型已应用于许多现实世界的知识图项目，包括卡内基梅隆大学的NELL项目和Google知识图。

著录项

作者
Pujara, Jay.;
展开▼
作者单位

University of Maryland, College Park.;

展开▼
授予单位 University of Maryland, College Park.;
学科 Artificial intelligence.;Epistemology.;Computer science.
学位 Ph.D.
年度 2016
页码 178 p.
总页数 178
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Methodology for a probabilistic analysis of an RCC gravity dam construction. Modelling of temperature, hydration degree and ageing degree fields [J] . A. Gaspar, F. Lopez-Caballero, A. Modaressi-Farahmand-Razavi, Engineering Structures . 2014,第apra15期

机译：碾压混凝土重力坝建设概率分析的方法论。温度，水合度和老化度场的建模
2. Constructing probabilistic graphical model from predicate formulas for fusing logical and probabilistic knowledge [J] . Liu W.-Y., Yue K., Gao M.-H. Information Sciences: An International Journal . 2011,第18期

机译：从谓词公式构建概率图形模型以融合逻辑和概率知识
3. Compiling Hierarchical Dependency Graph for Large-Span Musical Expressive Feature Analysis Using Multi-Scaling Probabilistic Graphical Models [J] . Ren Gang, Xuchen Yang, Zhe Wen, International Journal of Soft Computing and Software Engineering . 2013,第3期

机译：使用多尺度概率图形模型编译用于大跨度音乐表现特征分析的层次依赖图
4. Scientific Knowledge Construction. A Proposal of a Prognostic Model Based on Disciplinary Complement Networks [C] . Gastón Olivares, Juan Pablo Cárdenas, Juan Carlos Losada, IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining;International Symposium on Foundations of Open Source Intelligence and Security Informatics;International Symposium on Foundations and Applications of Big Data Analytics;International Symposium on Network Enabled Health Informatics, Biomedicine and Bioinformatics . 2018

机译：科学知识建设。基于学科互补网络的预测模型的建议
5. Scalable exact inference in probabilistic graphical models on multi-core platforms. [D] . Ma, Nam. 2014

机译：在多核平台上的概率图形模型中可扩展的精确推断。
6. Exploring soybean metabolic pathways based on probabilistic graphical model and knowledge-based methods [O] . Jie Hou, Gary Stacey, Jianlin Cheng 2015

机译：基于概率图形模型和基于知识的方法探索大豆代谢途径
7. Exploiting prior knowledge and latent variable representations for the statistical modeling and probabilistic querying of large knowledge graphs [O] . Krompaß Denis 2015

机译：利用先验知识和潜在变量表示来进行统计建模和大型知识图的概率查询

Probabilistic models for scalable knowledge graph construction.

摘要

著录项

相似文献

相关主题

期刊订阅