首页> 美国卫生研究院文献>Springer Open Choice >Comparison of strategies for scalable causal discovery of latent variable models from mixed data
【2h】

Comparison of strategies for scalable causal discovery of latent variable models from mixed data

机译:从混合数据中潜在变量模型的可伸缩因果发现策略比较

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Modern technologies allow large, complex biomedical datasets to be collected from patient cohorts. These datasets are comprised of both continuous and categorical data (“Mixed Data”), and essential variables may be unobserved in this data due to the complex nature of biomedical phenomena. Causal inference algorithms can identify important relationships from biomedical data; however, handling the challenges of causal inference over mixed data with unmeasured confounders in a scalable way is still an open problem. Despite recent advances into causal discovery strategies that could potentially handle these challenges; individually, no study currently exists that comprehensively compares these approaches in this setting. In this paper, we present a comparative study that addresses this problem by comparing the accuracy and efficiency of different strategies in large, mixed datasets with latent confounders. We experiment with two extensions of the Fast Causal Inference algorithm: a maximum probability search procedure we recently developed to identify causal orientations more accurately, and a strategy which quickly eliminates unlikely adjacencies in order to achieve scalability to high-dimensional data. We demonstrate that these methods significantly outperform the state of the art in the field by achieving both accurate edge orientations and tractable running time in simulation experiments on datasets with up to 500 variables. Finally, we demonstrate the usability of the best performing approach on real data by applying it to a biomedical dataset of HIV-infected individuals.Electronic supplementary materialThe online version of this article (10.1007/s41060-018-0104-3) contains supplementary material, which is available to authorized users.
机译:现代技术允许从患者队列中收集大型,复杂的生物医学数据集。这些数据集由连续数据和分类数据(“混合数据”)组成,由于生物医学现象的复杂性,在这些数据中可能没有观察到基本变量。因果推理算法可以从生物医学数据中识别重要的关系。但是,以可扩展的方式处理不可估量的混杂因素对混合数据的因果推理带来的挑战仍然是一个悬而未决的问题。尽管因果发现策略最近取得了进展,可以解决这些挑战;单独地,目前尚无研究在这种情况下全面比较这些方法。在本文中,我们提出了一项比较研究,通过在具有潜在混杂因素的大型混合数据集中比较不同策略的准确性和效率来解决此问题。我们尝试了快速因果推断算法的两个扩展:最近开发的最大概率搜索程序,可以更准确地识别因果方向;以及一种快速消除不太可能的邻接关系的策略,以实现对高维数据的可伸缩性。我们证明了这些方法通过在多达500个变量的数据集上进行的模拟实验中获得准确的边缘方向和可操作的运行时间,大大优于该领域的最新技术。最后,我们将最佳方法应用于真实数据的最佳实践,将其应用于HIV感染者的生物医学数据集。电子补充材料本文的在线版本(10.1007 / s41060-018-0104-3)包含补充材料,可供授权用户使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号