In-database batch and query-time inference over probabilistic graphical models using UDA-GIST

Li Kun; Zhou Xiaofeng; Wang Daisy Zhe; Grant Christan; Dobra Alin; Dudley Christopher

首页> 外文期刊>The VLDB journal >In-database batch and query-time inference over probabilistic graphical models using UDA-GIST

【24h】

In-database batch and query-time inference over probabilistic graphical models using UDA-GIST

机译：使用UDA-GIST对概率图形模型进行数据库内批处理和查询时推断

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

To meet customers pressing demands, enterprise database vendors have been pushing advanced analytical techniques into databases. Most major DBMSes use user-defined aggregates (UDAs), a data-driven operator, to implement analytical techniques in parallel. However, UDAs alone are not sufficient to implement statistical algorithms where most of the work is performed by iterative transitions over a large state that cannot be naively partitioned due to data dependency. Typically, this type of statistical algorithm requires pre-processing to set up the large state in the first place and demands post-processing after the statistical inference. This paper presents general iterative state transition (GIST), a new database operator for parallel iterative state transitions over large states. GIST receives a state constructed by a UDA and then performs rounds of transitions on the state until it converges. A final UDA performs post-processing and result extraction. We argue that the combination of UDA and GIST (UDA-GIST) unifies data-parallel and state-parallel processing in a single system, thus significantly extending the analytical capabilities of DBMSes. We exemplify the framework through two high-profile batch applications: cross-document coreference, image denoising and one query-time inference application: marginal inference queries over probabilistic knowledge graphs. The 3 applications use probabilistic graphical models, which encode complex relationships of different variables and are powerful for a wide range of problems. We show that the in-database framework allows us to tackle a 27 times larger problem than a scalable distributed solution for the first application and achieves 43 times speedup over the state-of-the-art for the second application. For the third application, we implement query-time inference using the UDA-GIST framework and apply over a probabilistic knowledge graph, achieving 10 times speedup over sequential inference. To the best of our knowledge, this is the first in-database query-time inference engine over large probabilistic knowledge base. We show that the UDA-GIST framework for data-and graph-parallel computations can support both batch and query-time inference efficiently in databases.

机译：为了满足客户迫切的需求，企业数据库供应商一直在将高级分析技术推入数据库。大多数主要的DBMS使用用户定义的聚合（UDA）（一种数据驱动的运算符）来并行实现分析技术。但是，仅UDA不足以实现统计算法，在统计算法中，大部分工作是通过在较大状态下进行迭代转换来执行的，该状态由于数据依赖性而无法进行天真的划分。通常，这种类型的统计算法首先需要进行预处理以建立大状态，并且在进行统计推断后需要进行后处理。本文介绍了通用迭代状态转换（GIST），这是一种用于在大状态上进行并行迭代状态转换的新数据库运算符。 GIST接收由UDA构造的状态，然后对该状态执行几轮转换，直到收敛为止。最终的UDA执行后处理和结果提取。我们认为，UDA和GIST（UDA-GIST）的组合统一了单个系统中的数据并行处理和状态并行处理，从而显着扩展了DBMS的分析能力。我们通过两个引人注目的批处理应用程序来举例说明该框架：跨文档共引用，图像去噪和一个查询时推理应用程序：对概率知识图的边际推理查询。这三个应用程序使用概率图形模型，该模型对不同变量的复杂关系进行编码，并且对于各种问题都具有强大的功能。我们表明，数据库内框架使我们能够解决第一个应用程序的27倍于可扩展分布式解决方案的问题，并且比第二个应用程序的最新技术实现43倍的加速。对于第三个应用程序，我们使用UDA-GIST框架实现查询时推理，并应用于概率知识图，实现比顺序推理快10倍的速度。据我们所知，这是大型概率知识库上的第一个数据库内查询时推理引擎。我们表明，用于数据和图形并行计算的UDA-GIST框架可以在数据库中有效地支持批处理和查询时推理。

著录项

来源
《The VLDB journal》 |2017年第2期|177-201|共25页
作者
Li Kun; Zhou Xiaofeng; Wang Daisy Zhe; Grant Christan; Dobra Alin; Dudley Christopher;
展开▼
作者单位

Univ Florida, Dept Comp & Informat Sci & Engn, Gainesville, FL 32611 USA;

Univ Florida, Dept Comp & Informat Sci & Engn, Gainesville, FL 32611 USA;

Univ Florida, Dept Comp & Informat Sci & Engn, Gainesville, FL 32611 USA;

Univ Oklahoma, Sch Comp Sci, Norman, OK 73019 USA;

Univ Florida, Dept Comp & Informat Sci & Engn, Gainesville, FL 32611 USA;

Univ Florida, Dept Comp & Informat Sci & Engn, Gainesville, FL 32611 USA;

展开▼
收录信息美国《科学引文索引》(SCI);
原文格式 PDF
正文语种 eng
中图分类
关键词
In-database analytics; Query-time inference; Batch inference; Data-parallel analytics; Graph-parallel analytics;

机译：数据库内分析;查询时推理;批处理推理;数据并行分析;图并行分析;

相似文献

外文文献
中文文献
专利

1. Inference attacks on genomic data based on probabilistic graphical models [J] . Zaobo He, Junxiu Zhou Big Data Mining and Analytics . 2020,第3期

机译：基于概率图形模型的基因组数据的推论攻击
2. Inference Attacks on Genomic Data Based on Probabilistic Graphical Models [J] . Zaobo He, Junxiu Zhou 大数据挖掘与分析(英文) . 2020,第003期

机译：基于概率图形模型的基因组数据的推论攻击
3. Improving probabilistic inference in graphical models with determinism and cycles [J] . Ibrahim Mohamed-Hamza, Pal Christopher, Pesant Gilles Machine Learning . 2017,第1期

机译：通过确定性和周期改进图形模型中的概率推理
4. Exact Inference for Relational Graphical Models with Interpreted Functions: Lifted Probabilistic Inference Modulo Theories [C] . Rodrigo de Salvo Braz, Ciaran OReilly Conference on Uncertainty in Artificial Intelligence . 2017

机译：具有解释功能的关系图形模型的精确推断：提升概率推断模数理论
5. Efficient learning and inference for probabilistic graphical models. [D] . Nie, Siqi. 2016

机译：概率图形模型的高效学习和推理。
6. Probabilistic Inference in General Graphical Models through Sampling in Stochastic Networks of Spiking Neurons [O] . Dejan Pecevski, Lars Buesing, Wolfgang Maass 2011

机译：通过尖峰神经元随机网络中的采样在一般图形模型中进行概率推断。
7. Lifted probabilistic inference for asymmetric graphical models [O] . Van den Broeck Guy, Niepert Mathias 2015

机译：非对称图形模型的提升概率推断

In-database batch and query-time inference over probabilistic graphical models using UDA-GIST

摘要

著录项

相似文献

相关主题

期刊订阅