首页> 外文会议> >Tabular Data Anomaly Patterns
【24h】

Tabular Data Anomaly Patterns

机译:表格数据异常模式

获取原文
获取原文并翻译 | 示例

摘要

One essential and challenging task in data science is data cleaning - the process of identifying and eliminating data anomalies. Different data types, data domains, data acquisition methods, and final purposes of data cleaning have resulted in different approaches in defining data anomalies in the literature. This paper proposes and describes a set of basic data anomalies in the form of anomaly patterns commonly encountered in tabular data, independently of the data domain, data acquisition technique, or the purpose of data cleaning. This set of anomalies can serve as a valuable basis for developing and enhancing software products that provide general-purpose data cleaning facilities and can provide a basis for comparing different tools aimed to support tabular data cleaning capabilities. Furthermore, this paper introduces a set of corresponding data operations suitable for addressing the identified anomaly patterns and introduces Grafterizer - a software framework that implements those data operations.
机译:数据清理是数据科学中一项必不可少且具有挑战性的任务,它是识别和消除数据异常的过程。在文献中,不同的数据类型,数据域,数据获取方法以及数据清理的最终目的导致了定义数据异常的不同方法。本文以表格数据中常见的异常模式的形式提出和描述了一组基本数据异常,与数据域,数据采集技术或数据清理的目的无关。这组异常现象可以作为开发和增强提供通用数据清理工具的软件产品的宝贵基础,并且可以为比较旨在支持表格数据清理功能的不同工具提供基础。此外,本文介绍了一组适用于解决已识别异常模式的相应数据操作,并介绍了Grafterizer(实现这些数据操作的软件框架)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号