Investigating structural metrics for understandability prediction of data warehouse multidimensional schemas using machine learning techniques

Anjana Gosain; Jaspreeti Singh

首页> 外文期刊>Innovations in Systems and Software Engineering >Investigating structural metrics for understandability prediction of data warehouse multidimensional schemas using machine learning techniques

【24h】

Investigating structural metrics for understandability prediction of data warehouse multidimensional schemas using machine learning techniques

机译：使用机器学习技术研究数据仓库多维模式的可理解性预测的结构度量

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

AbstractData warehouse (DW) quality metrics help in evaluating quality attributes and building classification models for predicting multidimensional (MD) schemas as understandable/non-understandable, thereby assisting in DW maintenance. To evaluate DW MD schema quality, we have earlier proposed a set of metrics based on some important aspects of dimension hierarchies and its sharing (like sharing of few hierarchy levels within a dimension; sharing of few hierarchy levels between dimensions, within and across facts) which may lead to structural complexity of MD schemas, thereby affecting its quality. The preliminary empirical validation of these metrics using classical statistical techniques (correlation and linear regression) indicated some of them as possible understandability indicators. However, machine learning (ML) techniques can model the complex associations between DW structural metrics and their quality attributes in a better way. Therefore, this work employs five ML classifiers [J48, partial decision trees (PART), Na?ve Bayes, support vector machines (SVM) and logistic regression] to empirically investigate whether accurate prediction models can be built, based on our structural metrics, to be used as understandability predictors. The obtained results reveal that four of our metrics are good predictors of understandability of DW MD schemas. The experimentation further involved comparing the classifiers using mainly five performance measures: accuracy, precision, sensitivity, specificity and area under the receiver operating characteristic curve. The study confirmed the predictive capability of ML techniques for understandability prediction of DW MD schemas. The results also suggest that the SVM and Na?ve Bayes classifiers perform better than other classifiers included in the study. Further, the typically used logistic regression technique gave results that were reasonably competitive with the more sophisticated techniques. However, the tree-based (J48) and rule-based (PART) techniques performed significantly worse than the best performing techniques.]]>

机译：<！[cdata [ <标题>抽象 ara id =“par1”>数据仓库（dw）质量指标在评估质量方面有助于评估质量属性和构建分类模型，用于预测多维（MD）模式作为可理解/不可易懂的模式，从而有助于DW维护。为了评估DW MD架构质量，我们早些时候提出了一组基于维度层次结构及其共享的一些重要方面的指标（如在维度内的几个层级水平的共享;在维度内，事实内和跨越事实之间共享几个层次结构级别）这可能导致MD模式的结构复杂性，从而影响其质量。使用经典统计技术（相关和线性回归）对这些度量的初步经验验证（相关和线性回归）将其中一些作为可能的可理解性指示符指示。然而，机器学习（ML）技术可以以更好的方式模拟DW结构度量和质量属性之间的复杂关联。因此，这项工作采用了五毫升分类器[J48，部分决定树（部分），NA ve贝叶斯，支持向量机（SVM）和Logistic回归]，以经验研究是否可以根据我们的结构指标构建精确的预测模型，用作可理解性预测器。所获得的结果表明，我们的四个指标是DW MD模式的可理解性的良好预测因子。实验进一步涉及使用主要五种性能措施的比较分类器：接收器操作特性曲线下的准确性，精度，灵敏度，特异性和面积。该研究证实了ML技术的预测能力，用于DW MD模式的可理解性预测。结果还表明，SVM和NA？VE贝叶斯分类器比研究中包含的其他分类器更好。此外，通常使用的逻辑回归技术得到了与更复杂的技术合理竞争的结果。然而，基于树的（J48）和基于规则的（部分）技术比最佳性能的技术更差。 ]]>

著录项

来源
《Innovations in Systems and Software Engineering》 |2018年第1期|共22页
作者
Anjana Gosain; Jaspreeti Singh;
展开▼
作者单位

University School of Information Communication &

Technology Guru Gobind Singh Indraprastha University;

University School of Information Communication &

Technology Guru Gobind Singh Indraprastha University;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类程序语言、算法语言;
关键词
Understandability; Machine learning; Multidimensional schemas; Structural metrics; Empirical validation; Data warehouse quality;

机译：可以理解;机器学习;多维模式;结构指标;经验验证;数据仓库质量;
入库时间 2022-08-20 01:56:29

相似文献

外文文献
中文文献
专利

1. Investigating structural metrics for understandability prediction of data warehouse multidimensional schemas using machine learning techniques [J] . Anjana Gosain, Jaspreeti Singh Innovations in Systems and Software Engineering . 2018,第1期

机译：使用机器学习技术研究数据仓库多维模式的可理解性预测的结构度量
2. Empirical validation of structural metrics for predicting understandability of conceptual schemas for data warehouse [J] . Manoj Kumar, Anjana Gosain, Yogesh Singh International journal of systems assurance engineering and management . 2014,第3期

机译：对结构指标进行实证验证，以预测数据仓库概念图的可理解性
3. Empirical studies to assess the understandability of data warehouse schemas using structural metrics [J] . Manuel Angel Serrano, Coral Calero, Houari A. Sahraoui, Software Quality Journal . 2008,第1期

机译：使用结构指标评估数据仓库模式的可理解性的经验研究
4. Empirical investigation of metrics for multidimensional model of Data Warehouse using Support Vector Machine [C] . Sabharwal Sangeeta, Nagpal Sushama, Aggarwal Gargi nternational Conference on Reliability, Infocom Technologies and Optimization . 2015

机译：支持向量机的数据仓库多维模型指标实证研究
5. Sourcing Risk Detection and Prediction with Online Public Data: An Application of Machine Learning Techniques in Supply Chain Risk Management [D] . ?Sun, Hang 2019

机译：利用在线公共数据的风险检测和预测：机器学习技术在供应链风险管理中的应用
6. Software Defect Prediction for Healthcare Big Data: An Empirical Evaluation of Machine Learning Techniques [O] . Bilal Khan, Rashid Naseem, Muhammad Arif Shah, 2021

机译：医疗保健大数据的软件缺陷预测：机器学习技术的实证评价
7. Data Warehouse Schemas using Multidimensional Data Model for Retail [O] . Kheri Arionadi Shobirin, Adi Panca Saputra Iskandar, Ida Bagus Alit Swamardika 2017

机译：数据仓库模式使用零售的多维数据模型
8. Data Warehouse Techniques to Support Global On-Demand Weather Forecast Metrics [R] . Joga, M. C. 2000

机译：支持全球按需天气预报指标的数据仓库技术

Investigating structural metrics for understandability prediction of data warehouse multidimensional schemas using machine learning techniques

摘要

著录项

相似文献

相关主题

期刊订阅