...
首页> 外文期刊>Cybernetics, IEEE Transactions on >A Parallel and Incremental Approach for Data-Intensive Learning of Bayesian Networks
【24h】

A Parallel and Incremental Approach for Data-Intensive Learning of Bayesian Networks

机译:贝叶斯网络数据密集学习的并行和增量方法

获取原文
获取原文并翻译 | 示例
           

摘要

Bayesian network (BN) has been adopted as the underlying model for representing and inferring uncertain knowledge. As the basis of realistic applications centered on probabilistic inferences, learning a BN from data is a critical subject of machine learning, artificial intelligence, and big data paradigms. Currently, it is necessary to extend the classical methods for learning BNs with respect to data-intensive computing or in cloud environments. In this paper, we propose a parallel and incremental approach for data-intensive learning of BNs from massive, distributed, and dynamically changing data by extending the classical scoring and search algorithm and using MapReduce. First, we adopt the minimum description length as the scoring metric and give the two-pass MapReduce-based algorithms for computing the required marginal probabilities and scoring the candidate graphical model from sample data. Then, we give the corresponding strategy for extending the classical hill-climbing algorithm to obtain the optimal structure, as well as that for storing a BN by pairs. Further, in view of the dynamic characteristics of the changing data, we give the concept of influence degree to measure the coincidence of the current BN with new data, and then propose the corresponding two-pass MapReduce-based algorithms for BNs incremental learning. Experimental results show the efficiency, scalability, and effectiveness of our methods.
机译:贝叶斯网络(BN)已被用作表示和推断不确定知识的基础模型。作为基于概率推理的实际应用程序的基础,从数据中学习BN是机器学习,人工智能和大数据范例的关键主题。当前,有必要针对数据密集型计算或在云环境中扩展用于学习BN的经典方法。在本文中,我们提出了一种并行和增量方法,通过扩展经典评分和搜索算法并使用MapReduce,从海量,分布式和动态变化的数据中对BN进行数据密集型学习。首先,我们采用最小描述长度作为评分标准,并给出基于两次通过MapReduce的算法,以计算所需的边际概率并从样本数据中对候选图形模型进行评分。然后,我们给出了相应的扩展经典爬坡算法以获得最佳结构的策略,以及用于通过对存储BN的策略。此外,针对变化的数据的动态特性,我们给出了影响程度的概念,以衡量当前BN与新数据的重合度,然后提出相应的基于两次遍历MapReduce的BN增量学习算法。实验结果表明了我们方法的效率,可扩展性和有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号