Effort estimation of ETL projects using Forward Stepwise Regression

机译：使用前向逐步回归估算ETL项目的工作量

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Effort estimation is a key component of planning a software development project. In the past, there has been a lot of research on estimation methods for traditional applications but, unfortunately, these methods do not apply to Extract Transform Load (ETL) projects. Coming up with a systematic effort estimate for ETL projects is a challenging task since ETL development does not follow the traditional Software Development Life Cycle (SDLC). Traditional application development is requirements-driven whereas ETL application development is data-driven. This research paper describes the development of an effort estimation model for ETL projects and compares this model with the most widely used algorithmic effort estimation model i.e. COCOMO II. A dataset comprising 220 industrial projects from five different software houses is used to build this effort estimation model using Forward Stepwise Regression. After eliminating 20 outliers from this dataset, the adjusted R2 (i.e. goodness of fit) of our model is 0.87. The prediction and training accuracy of this model is measured using the de-facto standard accuracy measures such as MMRE and PRED(25). On a training dataset of 200 projects, the training accuracy value of PRED(25) is 81.16% and MMRE is 0.16. Results show that our proposed estimation model provides considerably better estimation accuracy as compared to COCOMO II. On a validation dataset of 58 projects, the value of PRED(25) was 49% for our model as compared to 21% for COCOMO II. Furthermore, the MMRE of our model is 0.31 as compared to 0.99 for COCOMO II.

机译：估算工作量是规划软件开发项目的关键组成部分。过去，对传统应用的估算方法进行了大量研究，但不幸的是，这些方法不适用于提取变换负荷（ETL）项目。由于ETL开发不遵循传统的软件开发生命周期（SDLC），因此对ETL项目进行系统的工作量估算是一项艰巨的任务。传统的应用程序开发是由需求驱动的，而ETL应用程序开发是由数据驱动的。该研究论文描述了ETL项目的工作量估算模型的开发，并将此模型与最广泛使用的算法工作量估算模型（即COCOMO II）进行了比较。使用包含五个不同软件公司的220个工业项目的数据集，使用正向逐步回归来构建此工作量估算模型。从该数据集中消除20个离群值后，我们模型的调整后R2（即拟合优度）为0.87。该模型的预测和训练精度使用事实上的标准精度度量（如MMRE和PRED（25））进行测量。在200个项目的训练数据集上，PRED（25）的训练准确度值为81.16％，而MMRE为0.16。结果表明，与COCOMO II相比，我们提出的估算模型提供了更好的估算精度。在58个项目的验证数据集上，模型的PRED（25）值为49％，而COCOMO II为21％。此外，我们模型的MMRE为0.31，而COCOMO II为0.99。

著录项

来源
《International Conference on Emerging Technologies》|2015年|1-6|共6页
会议地点
作者
Raza Rasool; Ali Afzal Malik;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
COCOMO II; Data Warehousing; ETL; Effort Estimation; Estimation Accuracy; Forward Stepwise Regression; Project Management; Software Cost Estimation;

机译：COCOMO II;数据仓库; ETL;工作量估计;估计精度;正向逐步回归;项目管理;软件成本估计;

相似文献

外文文献
中文文献
专利

1. Improved estimation of software project effort using multiple additive regression trees [J] . Mahmoud O. Elish Expert systems with applications . 2009,第7期

机译：使用多个累加回归树改进对软件项目工作量的估计
2. Estimation of software project effort with support vector regression [J] . Adriano L.I. Oliveira Neurocomputing . 2006,第13a15期

机译：支持向量回归法估算软件项目工作量
3. Prediction of Television Audience Rating Based on Fuzzy Cognitive Maps with Forward Stepwise Regression [J] . Ma Nan, Wang Patrick, He Qin, International Journal of Pattern Recognition and Artificial Intelligence . 2017,第7期

机译：基于正向逐步回归的模糊认知图的电视收视率预测
4. Effort estimation of ETL projects using Forward Stepwise Regression [C] . Raza Rasool, Ali Afzal Malik International Conference on Emerging Technologies . 2015

机译：努力估计ETL项目的前向逐步回归
5. Stepwise forward multiple regression for complex traits in high density genome-wide association studies. [D] . Gu, Xiangjun. 2007

机译：在高密度全基因组关联研究中逐步推进复杂性状的多元回归。
6. Consistent Estimation of Generalized Linear Models with High Dimensional Predictors via Stepwise Regression [O] . Alex Pijyan, Qi Zheng, Hyokyoung G. Hong, 2020

机译：通过逐步回归一致地估计具有高维预测器的广义线性模型
7. Development of Predictive Model for Radon-222 Estimation in the Atmosphere using Stepwise Regression and Grid Search Based-Random Forest Regression [O] . Omodele Olubi, Ebeneze Oniya, Taoreed Owolabi 2021

机译：基于逐步回归和基于林林回归的大气中Radon-222估计的预测模型的开发
8. Pilot Willingness to Take Off Into Marginal Weather. Part II. Antecedent Overfitting with Forward Stepwise Logistic Regression Final rept [R] . Knecht, W. 2005

机译：飞行员愿意进入边缘天气。第二部分。具有前向逐步Logistic回归的前期过度拟合最终报告

Effort estimation of ETL projects using Forward Stepwise Regression

摘要

著录项

相似文献

相关主题

期刊订阅