【24h】

Variegated data swabbing: An improved purge approach for data cleaning

机译:杂色数据擦除:一种改进的清除方法,用于数据清除

获取原文
获取原文并翻译 | 示例

摘要

The Errors and inconsistencies present in the data cause massive problem in Data Warehousing. Aprodigious solution is required to extract relevant data for efficacious and infallible decision making. Therefore in this paper, we propose a mechanism an efficient Variegated Data Swabbing algorithm to enhance the eminence of raw data, by removing errors, inconsistencies, redundancies, and duplicity from the structured data. Proposed variegated data swabbing algorithm takes two data-sets from two different data sources, integrates them to form a new single data-set by removing all the duplicate rows along with the missing values or NaN values from the data. Spell checker algorithm is applied to the proposed system for checking mistakes or misspellings of words, suggestion for the respective words provided. The proposed system provides the better and efficient results than the existing algorithm in terms of Accuracy, Execution time and Space.
机译:数据中存在的错误和不一致会在数据仓库中引起严重的问题。需要出色的解决方案来提取相关数据,以进行有效而可靠的决策。因此,在本文中,我们提出了一种有效的杂色数据交换算法,通过消除结构化数据中的错误,不一致,冗余和重复性来提高原始数据的知名度。提出的杂色数据擦除算法从两个不同的数据源中获取两个数据集,通过从数据中删除所有重复的行以及缺失值或NaN值,将它们集成以形成一个新的单个数据集。将拼写检查器算法应用于所提出的系统,以检查单词的错误或拼写错误,并针对所提供的各个单词提出建议。在准确性,执行时间和空间方面,所提出的系统提供了比现有算法更好,更有效的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号