AbstractData warehouse (DW) quality metrics help in evaluating quality attributes and building classif'/> Investigating structural metrics for understandability prediction of data warehouse multidimensional schemas using machine learning techniques
首页> 外文期刊>Innovations in Systems and Software Engineering >Investigating structural metrics for understandability prediction of data warehouse multidimensional schemas using machine learning techniques
【24h】

Investigating structural metrics for understandability prediction of data warehouse multidimensional schemas using machine learning techniques

机译:使用机器学习技术研究数据仓库多维模式的可理解性预测的结构度量

获取原文
获取原文并翻译 | 示例
       

摘要

AbstractData warehouse (DW) quality metrics help in evaluating quality attributes and building classification models for predicting multidimensional (MD) schemas as understandable/non-understandable, thereby assisting in DW maintenance. To evaluate DW MD schema quality, we have earlier proposed a set of metrics based on some important aspects of dimension hierarchies and its sharing (like sharing of few hierarchy levels within a dimension; sharing of few hierarchy levels between dimensions, within and across facts) which may lead to structural complexity of MD schemas, thereby affecting its quality. The preliminary empirical validation of these metrics using classical statistical techniques (correlation and linear regression) indicated some of them as possible understandability indicators. However, machine learning (ML) techniques can model the complex associations between DW structural metrics and their quality attributes in a better way. Therefore, this work employs five ML classifiers [J48, partial decision trees (PART), Na?ve Bayes, support vector machines (SVM) and logistic regression] to empirically investigate whether accurate prediction models can be built, based on our structural metrics, to be used as understandability predictors. The obtained results reveal that four of our metrics are good predictors of understandability of DW MD schemas. The experimentation further involved comparing the classifiers using mainly five performance measures: accuracy, precision, sensitivity, specificity and area under the receiver operating characteristic curve. The study confirmed the predictive capability of ML techniques for understandability prediction of DW MD schemas. The results also suggest that the SVM and Na?ve Bayes classifiers perform better than other classifiers included in the study. Further, the typically used logistic regression technique gave results that were reasonably competitive with the more sophisticated techniques. However, the tree-based (J48) and rule-based (PART) techniques performed significantly worse than the best performing techniques.]]>
机译:<![cdata [ <标题>抽象 ara id =“par1”>数据仓库(dw)质量指标在评估质量方面有助于评估质量属性和构建分类模型,用于预测多维(MD)模式作为可理解/不可易懂的模式,从而有助于DW维护。为了评估DW MD架构质量,我们早些时候提出了一组基于维度层次结构及其共享的一些重要方面的指标(如在维度内的几个层级水平的共享;在维度内,事实内和跨越事实之间共享几个层次结构级别)这可能导致MD模式的结构复杂性,从而影响其质量。使用经典统计技术(相关和线性回归)对这些度量的初步经验验证(相关和线性回归)将其中一些作为可能的可理解性指示符指示。然而,机器学习(ML)技术可以以更好的方式模拟DW结构度量和质量属性之间的复杂关联。因此,这项工作采用了五毫升分类器[J48,部分决定树(部分),NA ve贝叶斯,支持向量机(SVM)和Logistic回归],以经验研究是否可以根据我们的结构指标构建精确的预测模型,用作可理解性预测器。所获得的结果表明,我们的四个指标是DW MD模式的可理解性的良好预测因子。实验进一步涉及使用主要五种性能措施的比较分类器:接收器操作特性曲线下的准确性,精度,灵敏度,特异性和面积。该研究证实了ML技术的预测能力,用于DW MD模式的可理解性预测。结果还表明,SVM和NA?VE贝叶斯分类器比研究中包含的其他分类器更好。此外,通常使用的逻辑回归技术得到了与更复杂的技术合理竞争的结果。然而,基于树的(J48)和基于规则的(部分)技术比最佳性能的技术更差。 ]]>

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号