首页> 外文期刊>Mathematical Problems in Engineering: Theory, Methods and Applications >Evaluating the Performance of Feature Selection Methods Using Huge Big Data: A Monte Carlo Simulation Approach
【24h】

Evaluating the Performance of Feature Selection Methods Using Huge Big Data: A Monte Carlo Simulation Approach

机译:基于海量大数据的特征选择方法性能评估:蒙特卡罗模拟方法

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

In this article, we compare autometrics and machine learning techniques including Minimax Concave Penalty (MCP), Elastic Smoothly Clipped Absolute Deviation (E-SCAD), and Adaptive Elastic Net (AEnet). For simulation experiments, three kinds of scenarios are considered by allowing the multicollinearity, heteroscedasticity, and autocorrelation conditions with varying sample sizes and the varied number of covariates. We found that all methods show improved their performance for a large sample size. In the presence of low and moderate multicollinearity and low and moderate autocorrelation, the considered methods retain all relevant variables. However, for low and moderate multicollinearity, excluding AEnet, all methods keep many irrelevant predictors as well. In contrast, under low and moderate autocorrelation, along with AEnet, the Autometrics retain less irrelevant predictors. Considering the case of extreme multicollinearity, AEnet retains more than 93 percent correct variables with an outstanding gauge (zero percent). However, the potency of remaining techniques, specifically MCP and E-SCAD, tends towards unity with augmenting sample size but capturing massive irrelevant predictors. Similarly, in case of high autocorrelation, E-SCAD has shown good performance in the selection of relevant variables for a small sample, while in gauge, Autometrics and AEnet are performed better and often retained less than 5 percent irrelevant variables. In the presence of heteroscedasticity, all techniques often hold all relevant variables but also suffer from overspecification problems except AEnet and Autometrics which circumvent the irrelevant predictors and establish the true model precisely. For an empirical application, we take into account the workers’ remittance data for Pakistan along its twenty-seven determinants spanning from 1972 to 2020 for Pakistan. The AEnet selected thirteen relevant covariates of workers’ remittance while E-SCAD and MCP suffered from an overspecification problem. Hence, the policymakers and practitioners should focus on the relevant variables selected by AEnet to improve workers' remittance in the case of Pakistan. In this regard, the Pakistan government has devised policies that make it easy to transfer remittances legally and mitigate the cost of transferring remittances from abroad. The AEnet approach can help policymakers arrive at relevant variables in the presence of a huge set of covariates, which in turn produce accurate predictions.
机译:在本文中,我们比较了自动测量和机器学习技术,包括最小最大凹面惩罚 (MCP)、弹性平滑裁剪绝对偏差 (E-SCAD) 和自适应弹性网络 (AEnet)。在模拟实验中,通过允许具有不同样本量和不同协变量数量的多重共线性、异方差性和自相关条件,考虑了三种情况。我们发现,所有方法在大样本量下都显示出更高的性能。在存在低和中等多重共线性以及低和中等自相关的情况下,所考虑的方法保留了所有相关变量。然而,对于低度和中度多重共线性,不包括AEnet,所有方法都保留了许多不相关的预测变量。相比之下,在低自相关和中度自相关下,与AEnet一起,Autometrics保留了较少的不相关预测变量。考虑到极端多重共线性的情况,AEnet 保留了超过 93% 的正确变量,并具有出色的量规(零%)。然而,其余技术,特别是MCP和E-SCAD的效力,倾向于通过增加样本量来统一,但捕获大量不相关的预测因子。同样,在高度自相关的情况下,E-SCAD在为小样本选择相关变量方面表现出良好的性能,而在仪表中,Autometrics和AEnet表现更好,并且通常保留不到5%的不相关变量。在存在异方差性的情况下,所有技术通常都包含所有相关变量,但也存在过度规范问题,除了AEnet和Autometrics,它们规避了不相关的预测变量并精确地建立了真正的模型。作为实证应用,我们考虑了巴基斯坦工人的汇款数据,以及从1972年到2020年巴基斯坦的27个决定因素。AEnet选择了13个工人汇款的相关协变量,而E-SCAD和MCP则存在规格过高的问题。因此,政策制定者和从业者应关注AEnet选择的相关变量,以改善巴基斯坦工人的汇款。在这方面,巴基斯坦政府制定了政策,使合法汇款变得容易,并降低了从国外汇款的成本。AEnet方法可以帮助政策制定者在存在大量协变量的情况下得出相关变量,从而产生准确的预测。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号