Correcting Misclassification Bias in Regression Models with Variables Generated via Data Mining

Qiao Mengke; Huang Ke-Wei

首页> 外文期刊>Information Systems Research >Correcting Misclassification Bias in Regression Models with Variables Generated via Data Mining

【24h】

Correcting Misclassification Bias in Regression Models with Variables Generated via Data Mining

机译：使用数据挖掘生成的变量纠正回归模型中的错误分类偏差

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

As a result of advances in data mining, more and more empirical studies in the social sciences apply classification algorithms to construct independent or dependent variables for further analysis via standard regression methods. In the classification phase of these studies, researchers need to subjectively choose a classification performance metric for optimization in the standard procedure. No matter which performance metric is chosen, the constructed variable still includes classification error because those variables cannot be classified perfectly. The misclassification of constructed variables will lead to inconsistent regression coefficient estimates in the following phase, which has been documented as a problem of measurement error in the econometrics literature. The pioneering discussions on the issue of estimation inconsistency because of misclassification in these studies have been provided. Our study attempts to investigate systematically the theoretical foundation of this problem when a newly constructed variable is used as the independent or dependent variable in linear and nonlinear regressions. Our theoretical analysis shows that consistent regression estimators can be recovered in all models studied in this paper. The main implication of our theoretical result is that researchers do not need to tune the classification algorithm to minimize the inconsistency of estimated regression coefficients because the inconsistency can be corrected by theoretical formulas, even when the classification accuracy is poor. Instead, we propose that a classification algorithm should be tuned to minimize the standard error of the focal regression coefficient derived based on the corrected formula. As a result, researchers can derive a consistent and most precise estimator in all models studied in this paper.

机译：由于数据挖掘的进步，社会科学中的越来越多的经验研究适用分类算法来构建独立或依赖变量，以通过标准回归方法进行进一步分析。在这些研究的分类阶段，研究人员需要主观地选择分类性能度量，以便在标准过程中进行优化。无论选择哪个性能度量，构造的变量仍然包括分类错误，因为这些变量不能完全分类。构造变量的错误分类将导致以下阶段的回归系数估计不一致，该估计被记录为经济学文献中的测量误差问题。提供了对这些研究中错误分类估计不一致问题的开创性讨论。我们的研究尝试系统地调查此问题的理论基础当新构造的变量用作线性和非线性回归中的独立或依赖变量时。我们的理论分析表明，在本文研究的所有模型中可以恢复一致的回归估计。我们理论结果的主要含义是研究人员不需要调整分类算法以最小化估计的回归系数的不一致，因为即使在分类精度差的情况下，也可以通过理论公式纠正不一致。相反，我们建议应调整分类算法以最小化基于校正公式导出的焦点回归系数的标准误差。因此，研究人员可以在本文研究的所有模型中得出一致和最精确的估计。

著录项

来源
《Information Systems Research》 |2021年第2期|462-480|共19页
作者
Qiao Mengke; Huang Ke-Wei;
展开▼
作者单位

Univ Sci & Technol China Sch Management Int Inst Finance Hefei 230026 Peoples R China;

Natl Univ Singapore Dept Informat Syst & Analyt Singapore 117417 Singapore;

展开▼
收录信息美国《科学引文索引》(SCI);
原文格式 PDF
正文语种 eng
中图分类
关键词
data mining; econometrics; measurement error; misclassification; statistical inference; performance metric;

机译：数据挖掘;经济学;测量误差;错误分类;统计推断;性能指标;

相似文献

外文文献
中文文献
专利

1. Mind the Gap: Accounting for Measurement Error and Misclassification in Variables Generated via Data Mining [J] . Yang Mochen, Adomavicius Gediminas, Burtch Gordon, Information Systems Research . 2018,第1期

机译：留意差距：解决因数据挖掘产生的变量中的测量误差和分类错误
2. Bias-Corrected AIC for Selecting Variables in Poisson Regression Models [J] . KEN-ICHI KAMO, HIROKAZU YANAGIHARA, KENICHI SATOH Communications in Statistics . 2013,第10a12期

机译：泊松回归模型中用于选择变量的偏差校正AIC
3. Bias-corrected AIC for selecting variables in multinomial logistic regression models [J] . Yanagihara H., Kamo K.-I., Imori S., Linear Algebra and its Applications . 2012,第11期

机译：偏差校正AIC在多项Logistic回归模型中选择变量
4. A Regression Model to Correct for Intra-Hourly Irradiance Variability Bias in Solar Energy Models [C] . Kristen Bradford, Richard Walker, Dennis Moon, IEEE Photovoltaic Specialists Conference . 2020

机译：回归模型，以校正太阳能模型中小时辐照度变异偏差
5. An ordinal logistic regression model with misclassification of the outcome variable and categorical covariate. [D] . Shirkey, Beverly Ann. 2009

机译：具有结果变量和分类协变量分类错误的序数逻辑回归模型。
6. Bias corrected estimates for logistic regression models for complex surveys with application to the United States’ Nationwide Inpatient Sample [O] . Kevin A. Rader, Stuart R. Lipsitz, Garrett M. Fitzmaurice, -1

机译：偏倚校正了用于复杂调查的逻辑回归模型的估计并将其应用于美国全国住院患者样本
7. Bias-corrected AIC for selecting variables in Poisson regression models [O] . 2015

机译：用于在泊松回归模型中选择变量的偏差校正aIC

Correcting Misclassification Bias in Regression Models with Variables Generated via Data Mining

摘要

著录项

相似文献

相关主题

期刊订阅