首页> 外文学位 >DeepDive: A Data Management System for Automatic Knowledge Base Construction.
【24h】

DeepDive: A Data Management System for Automatic Knowledge Base Construction.

机译:DeepDive:用于自动知识库构建的数据管理系统。

获取原文
获取原文并翻译 | 示例

摘要

Many pressing questions in science are macroscopic: they require scientists to consult information expressed in a wide range of resources, many of which are not organized in a structured relational form. Knowledge base construction (KBC) is the process of populating a knowledge base, i.e., a relational database storing factual information, from unstructured inputs. KBC holds the promise of facilitating a range of macroscopic sciences by making information accessible to scientists.;One key challenge in building a high-quality KBC system is that developers must often deal with data that are both diverse in type and large in size. Further complicating the scenario is that these data need to be manipulated by both relational operations and state-of-the-art machine-learning techniques. This dissertation focuses on supporting this complex process of building KBC systems. DeepDive is a data management system that we built to study this problem; its ultimate goal is to allow scientists to build a KBC system by declaratively specifying domain knowledge without worrying about any algorithmic, performance, or scalability issues.;DeepDive was built by generalizing from our experience in building more than ten high-quality KBC systems, many of which exceed human quality or are top-performing systems in KBC competitions, and many of which were built completely by scientists or industry users using DeepDive. From these examples, we designed a declarative language to specify a KBC system and a concrete protocol that iteratively improves the quality of KBC systems. This flexible framework introduces challenges of scalability and performance--Many KBC systems built with DeepDive contain statistical inference and learning tasks over terabytes of data, and the iterative protocol also requires executing similar inference problems multiple times. Motivated by these challenges, we designed techniques that make both the batch execution and incremental execution of a KBC program up to two orders of magnitude more efficient and/or scalable. This dissertation describes the DeepDive framework, its applications, and these techniques, to demonstrate the thesis that it is feasible to build an efficient and scalable data management system for the end-to-end workflow of building KBC systems.
机译:科学中许多紧迫的问题是宏观的:它们要求科学家查阅以多种资源表达的信息,其中许多资源不是以结构化的关系形式组织的。知识库构建(KBC)是从非结构化输入中填充知识库(即存储事实信息的关系数据库)的过程。 KBC有望通过使科学家可以访问信息来促进一系列宏观科学的发展。;建立高质量KBC系统的一个主要挑战是开发人员必须经常处理类型和大小都不同的数据。使场景进一步复杂化的是,这些数据需要通过关系操作和最新的机器学习技术来操纵。本文的重点是支持构建KBC系统这一复杂过程。 DeepDive是我们为研究此问题而构建的数据管理系统。它的最终目标是允许科学家通过声明性地指定领域知识来构建KBC系统,而不必担心任何算法,性能或可伸缩性问题。DeepDive是通过总结我们在构建十多个高质量KBC系统中的经验而构建的。其中超过人类的质量或在KBC竞赛中表现最好的系统,其中许多完全由科学家或行业用户使用DeepDive构建。从这些示例中,我们设计了一种声明性语言来指定KBC系统,并设计了一种具体协议来迭代地提高KBC系统的质量。这种灵活的框架带来了可扩展性和性能方面的挑战-许多使用DeepDive构建的KBC系统包含数以千计的数据的统计推断和学习任务,并且迭代协议还需要多次执行类似的推断问题。受这些挑战的激励,我们设计了使KBC程序的批处理和增量执行效率和/或可扩展性提高两个数量级的技术。本文介绍了DeepDive框架,其应用和这些技术,以证明为构建KBC系统的端到端工作流构建高效,可扩展的数据管理系统是可行的。

著录项

  • 作者

    Zhang, Ce.;

  • 作者单位

    The University of Wisconsin - Madison.;

  • 授予单位 The University of Wisconsin - Madison.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2015
  • 页码 205 p.
  • 总页数 205
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:52:59

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号