Optimal subset selection for causal inference using machine learning ensembles and particle swarm optimization

Dhruv Sharma; Christopher Willy; John Bischoff

首页> 外文期刊>Complex & Intelligent Systems >Optimal subset selection for causal inference using machine learning ensembles and particle swarm optimization

【24h】

Optimal subset selection for causal inference using machine learning ensembles and particle swarm optimization

机译：使用机器学习合奏和粒子群优化的因果推断的最佳子集选择

获取原文

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We suggest and evaluate a method for optimal construction of synthetic treatment and control samples for the purpose of drawing causal inference. The balance optimization subset selection problem, which formulates minimization of aggregate imbalance in covariate distributions to reduce bias in data, is a new area of study in operations research. We investigate a novel metric, cross-validated area under the receiver operating characteristic curve (AUC) as a measure of balance between treatment and control groups. The proposed approach provides direct and automatic balancing of covariate distributions. In addition, the AUC-based approach is able to detect subtler distributional differences than existing measures, such as simple empirical mean/variance and count-based metrics. Thus, optimizing AUCs achieves a greater balance than the existing methods. Using 5 widely used real data sets and 7 synthetic data sets, we show that optimization of samples using existing methods (Chi-square, mean variance differences, Kolmogorov–Smirnov, and Mahalanobis) results in samples containing imbalance that is detectable using machine learning ensembles. We minimize covariate imbalance by minimizing the absolute value of the distance of the maximum cross-validated AUC on M documentclass[12pt]{minimal} usepackage{amsmath} usepackage{wasysym} usepackage{amsfonts} usepackage{amssymb} usepackage{amsbsy} usepackage{mathrsfs} usepackage{upgreek} setlength{oddsidemargin}{-69pt} egin{document}$$ M $$end{document} folds from 0.50, using evolutionary optimization. We demonstrate that particle swarm optimization (PSO) outperforms modified cuckoo swarm (MCS) for a gradient-free, non-linear noisy cost function. To compute AUCs, we use supervised binary classification approaches from the machine learning and credit scoring literature. Using superscore ensembles adds to the classifier-based two-sample testing literature. If the mean cross-validated AUC based on machine learning is 0.50, the two groups are indistinguishable and suitable for causal inference.

机译：我们建议并评估了一种用于绘制因果推理的合成治疗和控制样品的最佳结构的方法。平衡优化子集选择问题，它可以最大限度地排放在协变量分布中以减少数据的偏差，是运营研究的新研究领域。我们研究了在接收器操作特征曲线（AUC）下的新型公制交叉验证区域，作为治疗和对照组之间的平衡量。该方法提供了协变分配的直接和自动平衡。此外，基于AUC的方法能够检测到比现有措施的子地图分布差异，例如简单的经验均值/方差和基于计数的指标。因此，优化AUC达到比现有方法更大的平衡。使用5种广泛使用的真实数据集和7个合成数据集，我们显示使用现有方法的样本优化（Chi-Square，平均差异，Kolmogorov-Smirnov和Mahalanobis）导致包含不平衡的样品，使用机器学习集合可检测。通过最大限度地减少M DocumentClass [12pt] {minimal} usepackage {ammath} usepackage {isysym} usepackage {amsfonts} uderpackage {amsfonts} usepackage {amssymb} usepackage {amsbsy} usepackage {mathrsfs} usepackage {supmeek} setLength { oddsidemargin} {-69pt} begin {document} $$ m $$ m $$ m $$ end {document}使用进化优化来折叠0.50。我们展示了粒子群优化（PSO）优于修改的Cuckoo Swarm（MCS），用于无梯度，非线性嘈杂的成本函数。为了计算AUC，我们使用来自机器学习和信用评分文献的监督二进制分类方法。使用Superscore合奏会增加基于分类的二次样本测试文献。如果基于机器学习的平均交叉验证的AUC为0.50，则两组难以区分，适用于因果推断。

著录项

来源
《Complex & Intelligent Systems》 |2020年第1期|共19页
作者
Dhruv Sharma; Christopher Willy; John Bischoff;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
AnalyticsEvolutionary computingSwarm optimizationMachine learning;

机译：Analyticsevolutionary Computingswarm优化的学习;

相似文献

外文文献
中文文献
专利

1. A Machine Learning Framework for Feature Selection in Heart Disease Classification Using Improved Particle Swarm Optimization with Support Vector Machine Classifier [J] . Vijayashree J., Sultana H. Parveen Programming and Computer Software . 2018,第6期

机译：基于支持向量机分类器的改进粒子群算法的心脏病分类特征选择机器学习框架
2. Optimal selection of ensemble classifiers using particle swarm optimization and diversity measures [J] . Hasanpour Hesam, Meibodi Ramak Ghavamizadeh, Navi Keivan Intelligent decision technologies . 2019,第1期

机译：使用粒子群优化和多样性测度的集成分类器最优选择
3. Inertial sensor-based human activity recognition via ensemble extreme learning machines optimized by quantum-behaved particle swarm [J] . Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2020,第2aPta1期

机译：基于惯性传感器的人类活动识别通过由量子行为粒子群优化的集合极限学习机
4. A New Binary Particle Swarm Optimization for Feature Subset Selection with Support Vector Machine [C] . Amir Rajabi Behjat, Aida Mustapha, Hossein Nezamabadi-Pour, International Conference on Soft Computing and Data Mining . 2014

机译：具有支持向量机的特征子集选择的新二进制粒子群优化
5. Optimal Subset Selection for Causal Inference Using Machine Learning and Particle Swarm Optimization [D] . Sharma, Dhruv. 2018

机译：基于机器学习和粒子群算法的因果推理最优子集选择
6. Optimum Feature Selection with Particle Swarm Optimization to Face Recognition System Using Gabor Wavelet Transform and Deep Learning [O] . Sulayman Ahmed, Mondher Frikha, Taha Darwassh Hanawy Hussein, 2021

机译：使用Gabor小波变换和深度学习的粒子群优化对面部识别系统的最佳特征选择
7. Data-driven input variable selection for rainfall-runoff modeling using binary-coded particle swarm optimization and Extreme Learning Machines [O] . Taormina R, Chau KW 2015

机译：数据驱动的输入变量选择，用于使用二进制编码粒子群优化和极限学习机的降雨径流建模

Optimal subset selection for causal inference using machine learning ensembles and particle swarm optimization

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅