首页> 外文期刊>The VLDB journal >In-database batch and query-time inference over probabilistic graphical models using UDA-GIST
【24h】

In-database batch and query-time inference over probabilistic graphical models using UDA-GIST

机译:使用UDA-GIST对概率图形模型进行数据库内批处理和查询时推断

获取原文
获取原文并翻译 | 示例
       

摘要

To meet customers pressing demands, enterprise database vendors have been pushing advanced analytical techniques into databases. Most major DBMSes use user-defined aggregates (UDAs), a data-driven operator, to implement analytical techniques in parallel. However, UDAs alone are not sufficient to implement statistical algorithms where most of the work is performed by iterative transitions over a large state that cannot be naively partitioned due to data dependency. Typically, this type of statistical algorithm requires pre-processing to set up the large state in the first place and demands post-processing after the statistical inference. This paper presents general iterative state transition (GIST), a new database operator for parallel iterative state transitions over large states. GIST receives a state constructed by a UDA and then performs rounds of transitions on the state until it converges. A final UDA performs post-processing and result extraction. We argue that the combination of UDA and GIST (UDA-GIST) unifies data-parallel and state-parallel processing in a single system, thus significantly extending the analytical capabilities of DBMSes. We exemplify the framework through two high-profile batch applications: cross-document coreference, image denoising and one query-time inference application: marginal inference queries over probabilistic knowledge graphs. The 3 applications use probabilistic graphical models, which encode complex relationships of different variables and are powerful for a wide range of problems. We show that the in-database framework allows us to tackle a 27 times larger problem than a scalable distributed solution for the first application and achieves 43 times speedup over the state-of-the-art for the second application. For the third application, we implement query-time inference using the UDA-GIST framework and apply over a probabilistic knowledge graph, achieving 10 times speedup over sequential inference. To the best of our knowledge, this is the first in-database query-time inference engine over large probabilistic knowledge base. We show that the UDA-GIST framework for data-and graph-parallel computations can support both batch and query-time inference efficiently in databases.
机译:为了满足客户迫切的需求,企业数据库供应商一直在将高级分析技术推入数据库。大多数主要的DBMS使用用户定义的聚合(UDA)(一种数据驱动的运算符)来并行实现分析技术。但是,仅UDA不足以实现统计算法,在统计算法中,大部分工作是通过在较大状态下进行迭代转换来执行的,该状态由于数据依赖性而无法进行天真的划分。通常,这种类型的统计算法首先需要进行预处理以建立大状态,并且在进行统计推断后需要进行后处理。本文介绍了通用迭代状态转换(GIST),这是一种用于在大状态上进行并行迭代状态转换的新数据库运算符。 GIST接收由UDA构造的状态,然后对该状态执行几轮转换,直到收敛为止。最终的UDA执行后处理和结果提取。我们认为,UDA和GIST(UDA-GIST)的组合统一了单个系统中的数据并行处理和状态并行处理,从而显着扩展了DBMS的分析能力。我们通过两个引人注目的批处理应用程序来举例说明该框架:跨文档共引用,图像去噪和一个查询时推理应用程序:对概率知识图的边际推理查询。这三个应用程序使用概率图形模型,该模型对不同变量的复杂关系进行编码,并且对于各种问题都具有强大的功能。我们表明,数据库内框架使我们能够解决第一个应用程序的27倍于可扩展分布式解决方案的问题,并且比第二个应用程序的最新技术实现43倍的加速。对于第三个应用程序,我们使用UDA-GIST框架实现查询时推理,并应用于概率知识图,实现比顺序推理快10倍的速度。据我们所知,这是大型概率知识库上的第一个数据库内查询时推理引擎。我们表明,用于数据和图形并行计算的UDA-GIST框架可以在数据库中有效地支持批处理和查询时推理。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号