首页> 外文期刊>Complex & Intelligent Systems >Optimal subset selection for causal inference using machine learning ensembles and particle swarm optimization
【24h】

Optimal subset selection for causal inference using machine learning ensembles and particle swarm optimization

机译:使用机器学习合奏和粒子群优化的因果推断的最佳子集选择

获取原文
获取外文期刊封面目录资料

摘要

We suggest and evaluate a method for optimal construction of synthetic treatment and control samples for the purpose of drawing causal inference. The balance optimization subset selection problem, which formulates minimization of aggregate imbalance in covariate distributions to reduce bias in data, is a new area of study in operations research. We investigate a novel metric, cross-validated area under the receiver operating characteristic curve (AUC) as a measure of balance between treatment and control groups. The proposed approach provides direct and automatic balancing of covariate distributions. In addition, the AUC-based approach is able to detect subtler distributional differences than existing measures, such as simple empirical mean/variance and count-based metrics. Thus, optimizing AUCs achieves a greater balance than the existing methods. Using 5 widely used real data sets and 7 synthetic data sets, we show that optimization of samples using existing methods (Chi-square, mean variance differences, Kolmogorov–Smirnov, and Mahalanobis) results in samples containing imbalance that is detectable using machine learning ensembles. We minimize covariate imbalance by minimizing the absolute value of the distance of the maximum cross-validated AUC on M documentclass[12pt]{minimal} usepackage{amsmath} usepackage{wasysym} usepackage{amsfonts} usepackage{amssymb} usepackage{amsbsy} usepackage{mathrsfs} usepackage{upgreek} setlength{oddsidemargin}{-69pt} egin{document}$$ M $$end{document} folds from 0.50, using evolutionary optimization. We demonstrate that particle swarm optimization (PSO) outperforms modified cuckoo swarm (MCS) for a gradient-free, non-linear noisy cost function. To compute AUCs, we use supervised binary classification approaches from the machine learning and credit scoring literature. Using superscore ensembles adds to the classifier-based two-sample testing literature. If the mean cross-validated AUC based on machine learning is 0.50, the two groups are indistinguishable and suitable for causal inference.
机译:我们建议并评估了一种用于绘制因果推理的合成治疗和控制样品的最佳结构的方法。平衡优化子集选择问题,它可以最大限度地排放在协变量分布中以减少数据的偏差,是运营研究的新研究领域。我们研究了在接收器操作特征曲线(AUC)下的新型公制交叉验证区域,作为治疗和对照组之间的平衡量。该方法提供了协变分配的直接和自动平衡。此外,基于AUC的方法能够检测到比现有措施的子地图分布差异,例如简单的经验均值/方差和基于计数的指标。因此,优化AUC达到比现有方法更大的平衡。使用5种广泛使用的真实数据集和7个合成数据集,我们显示使用现有方法的样本优化(Chi-Square,平均差异,Kolmogorov-Smirnov和Mahalanobis)导致包含不平衡的样品,使用机器学习集合可检测。通过最大限度地减少M DocumentClass [12pt] {minimal} usepackage {ammath} usepackage {isysym} usepackage {amsfonts} uderpackage {amsfonts} usepackage {amssymb} usepackage {amsbsy} usepackage {mathrsfs} usepackage {supmeek} setLength { oddsidemargin} {-69pt} begin {document} $$ m $$ m $$ m $$ end {document}使用进化优化来折叠0.50。我们展示了粒子群优化(PSO)优于修改的Cuckoo Swarm(MCS),用于无梯度,非线性嘈杂的成本函数。为了计算AUC,我们使用来自机器学习和信用评分文献的监督二进制分类方法。使用Superscore合奏会增加基于分类的二次样本测试文献。如果基于机器学习的平均交叉验证的AUC为0.50,则两组难以区分,适用于因果推断。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号