首页> 外文会议>International Conference on Emerging Technologies >Effort estimation of ETL projects using Forward Stepwise Regression
【24h】

Effort estimation of ETL projects using Forward Stepwise Regression

机译:使用前向逐步回归估算ETL项目的工作量

获取原文

摘要

Effort estimation is a key component of planning a software development project. In the past, there has been a lot of research on estimation methods for traditional applications but, unfortunately, these methods do not apply to Extract Transform Load (ETL) projects. Coming up with a systematic effort estimate for ETL projects is a challenging task since ETL development does not follow the traditional Software Development Life Cycle (SDLC). Traditional application development is requirements-driven whereas ETL application development is data-driven. This research paper describes the development of an effort estimation model for ETL projects and compares this model with the most widely used algorithmic effort estimation model i.e. COCOMO II. A dataset comprising 220 industrial projects from five different software houses is used to build this effort estimation model using Forward Stepwise Regression. After eliminating 20 outliers from this dataset, the adjusted R2 (i.e. goodness of fit) of our model is 0.87. The prediction and training accuracy of this model is measured using the de-facto standard accuracy measures such as MMRE and PRED(25). On a training dataset of 200 projects, the training accuracy value of PRED(25) is 81.16% and MMRE is 0.16. Results show that our proposed estimation model provides considerably better estimation accuracy as compared to COCOMO II. On a validation dataset of 58 projects, the value of PRED(25) was 49% for our model as compared to 21% for COCOMO II. Furthermore, the MMRE of our model is 0.31 as compared to 0.99 for COCOMO II.
机译:估算工作量是规划软件开发项目的关键组成部分。过去,对传统应用的估算方法进行了大量研究,但不幸的是,这些方法不适用于提取变换负荷(ETL)项目。由于ETL开发不遵循传统的软件开发生命周期(SDLC),因此对ETL项目进行系统的工作量估算是一项艰巨的任务。传统的应用程序开发是由需求驱动的,而ETL应用程序开发是由数据驱动的。该研究论文描述了ETL项目的工作量估算模型的开发,并将此模型与最广泛使用的算法工作量估算模型(即COCOMO II)进行了比较。使用包含五个不同软件公司的220个工业项目的数据集,使用正向逐步回归来构建此工作量估算模型。从该数据集中消除20个离群值后,我们模型的调整后R2(即拟合优度)为0.87。该模型的预测和训练精度使用事实上的标准精度度量(如MMRE和PRED(25))进行测量。在200个项目的训练数据集上,PRED(25)的训练准确度值为81.16%,而MMRE为0.16。结果表明,与COCOMO II相比,我们提出的估算模型提供了更好的估算精度。在58个项目的验证数据集上,模型的PRED(25)值为49%,而COCOMO II为21%。此外,我们模型的MMRE为0.31,而COCOMO II为0.99。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号