首页> 外文期刊>International journal of data analysis techniques and strategies >Data quality improvement in data warehouse: a framework
【24h】

Data quality improvement in data warehouse: a framework

机译:数据仓库中数据质量的改善:框架

获取原文
获取原文并翻译 | 示例
           

摘要

Data cleansing is an extremely imperative process which when carried out on the datasets, eliminates the inconsistency and duplicity from the data. It also handles null values or missing values in the data in an organised and proper manner thereby enhancing the quality of the data. In this paper, we use Kullback-Leibler divergence (KL-divergence) technique to eliminate duplicity in the datasets. Inconsistency, null values or missing values are also handled in the datasets. This is done by maintaining data marts which are made on the basis of test data. Accordingly, a framework for efficient data cleansing is suggested in order to make the data appropriate and proper for decision making purpose. A brief comparison of existing approaches of data cleansing have also been discussed. This comparison is based on various parameters such as prediction error, bias, mean square error, variance, mean absolute error, root mean square error, Theil statistics etc. These parameters are used by distance sum-based approach (DSA) to accomplish the task. The results obtained demonstrate the feasibility and validity of our method.
机译:数据清理是一个非常必要的过程,当对数据集进行清理时,它将消除数据的不一致和重复性。它还以有组织的适当方式处理数据中的空值或缺失值,从而提高了数据的质量。在本文中,我们使用Kullback-Leibler散度(KL-散度)技术消除数据集中的重复性。数据集中还会处理不一致,空值或缺失值。这是通过维护基于测试数据制作的数据集市来完成的。因此,提出了一种有效的数据清理框架,以使数据适合决策目的。还讨论了现有数据清理方法的简要比较。该比较基于各种参数,例如预测误差,偏差,均方误差,方差,平均绝对误差,均方根误差,Theil统计量等。基于距离和的方法(DSA)使用这些参数来完成任务。获得的结果证明了我们方法的可行性和有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号