首页> 外文会议>IEEE International Conference on Smart Data Services >Scalable and Hybrid Ensemble-Based Causality Discovery
【24h】

Scalable and Hybrid Ensemble-Based Causality Discovery

机译:可扩展和混合合奏的因果关系发现

获取原文

摘要

Causality discovery mines cause-effect relationships among different variables of a system and has been widely used in many disciplines including climatology and neuroscience. To discover causal relationships, many data-driven causality discovery methods, e.g., Granger causality, PCMCI and Dynamic Bayesian Network, have been proposed. Many of these causality discovery approaches mine time series data and generate a directed causality graph where each graph edge denotes a cause-effect relationship between the two connected graph nodes. Our benchmarking of different causality discovery approaches with real-world climate data shows these approaches often generate quite different causality results with the same input dataset due to their internal learning mechanism differences. Meanwhile, there are ever-increasing available data in virtually every discipline, which makes it more and more difficult to use existing causality discovery algorithms to produce causality results within reasonable time. To address these two challenges, this paper utilizes data partitioning and ensemble techniques, and proposes a two-phase hybrid causality ensemble framework. The framework first conducts phase 1 data ensemble for partitioned data and then conducts phase 2 algorithm ensemble from data ensemble results. To achieve scalability, we further parallelize the ensemble approaches via the Spark big data analytics engine. Our experiments show that our proposed approaches achieve good accuracy through ensemble and high scalability through data-parallelization in distributed computing environments.
机译:因果区发现矿物因系统的不同变量之间的效果关系,并且已被广泛用于许多学科,包括气候学和神经科学。为了发现因果关系,已经提出了许多数据驱动的因果区发现方法,例如格兰杰因果关系,PCMCI和动态贝叶斯网络。这些因果区发现中的许多方法接近矿井时间序列数据,并生成定向因果关系图,其中每个图形边缘表示两个连接的图形节点之间的原因效果关系。我们具有现实世界气候数据的不同因果区发现方法的基准显示,由于其内部学习机制差异,这些方法通常会产生相同的输入数据集的不同因果关系。同时,几乎每种学科都有越来越多的可用数据,这使得使用现有的因果区发现算法越来越困难,以在合理的时间内产生因果关系。为了解决这两个挑战,本文利用数据分区和集合技术,提出了一种两相混合因果区集合框架。该框架首先对分区数据进行阶段1数据集合,然后通过数据集合结果进行相位2算法。为了实现可扩展性,我们还通过Spark Big Data Analytics引擎并行化集合方法。我们的实验表明,我们的建议方法通过分布式计算环境中的数据并行化通过集合和高可扩展性来实现良好的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号