Tabular Data Anomaly Patterns

机译：表格数据异常模式

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

One essential and challenging task in data science is data cleaning - the process of identifying and eliminating data anomalies. Different data types, data domains, data acquisition methods, and final purposes of data cleaning have resulted in different approaches in defining data anomalies in the literature. This paper proposes and describes a set of basic data anomalies in the form of anomaly patterns commonly encountered in tabular data, independently of the data domain, data acquisition technique, or the purpose of data cleaning. This set of anomalies can serve as a valuable basis for developing and enhancing software products that provide general-purpose data cleaning facilities and can provide a basis for comparing different tools aimed to support tabular data cleaning capabilities. Furthermore, this paper introduces a set of corresponding data operations suitable for addressing the identified anomaly patterns and introduces Grafterizer - a software framework that implements those data operations.

机译：数据清理是数据科学中一项必不可少且具有挑战性的任务，它是识别和消除数据异常的过程。在文献中，不同的数据类型，数据域，数据获取方法以及数据清理的最终目的导致了定义数据异常的不同方法。本文以表格数据中常见的异常模式的形式提出和描述了一组基本数据异常，与数据域，数据采集技术或数据清理的目的无关。这组异常现象可以作为开发和增强提供通用数据清理工具的软件产品的宝贵基础，并且可以为比较旨在支持表格数据清理功能的不同工具提供基础。此外，本文介绍了一组适用于解决已识别异常模式的相应数据操作，并介绍了Grafterizer（实现这些数据操作的软件框架）。

著录项

来源
《》|2017年|25-34|共10页
会议地点 Prague(CZ)
作者
Dina Sukhobok; Nikolay Nikolov; Dumitru Roman;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Cleaning; Data integrity; Data models; Task analysis; Software; Tools; Taxonomy;

机译：清洗;数据完整性;数据模型;任务分析;软件;工具;分类法;;
入库时间 2022-08-26 13:58:20

相似文献

外文文献
中文文献
专利

1. Discovery of characteristic patterns from tabular structured data including missing values [J] . Shigeaki Sakurai, Kouichirou Mori International Journal of Business Intelligence and Data Mining . 2010,第3期

机译：从表格结构化数据中发现特征模式，包括缺失值
2. A data mining-based framework for the identification of daily electricity usage patterns and anomaly detection in building electricity consumption data [J] . Liu Xue, Ding Yong, Tang Hao, Energy and Buildings . 2021,第Jana期

机译：基于数据挖掘的框架，用于识别建筑电力消耗数据的日常电力使用模式和异常检测
3. Teleconnection of atmospheric and oceanic climate anomalies with Australian weather patterns: a review of data availability [J] . Jasmine B.D. Jaffrés, Chris Cuff, Cecily Rasmussen, Earth-Science Reviews: The International Geological Journal Bridging the Gap between Research Articles and Textbooks . 2018,第期

机译：澳大利亚天气模式的大气和海洋气候异常的连接：数据可用性综述
4. Tabular Data Anomaly Patterns [C] . Dina Sukhobok, Nikolay Nikolov, Dumitru Roman International Conference on Big Data Innovations and Applications . 2017

机译：表格数据异常模式
5. Modeling the Age Pattern of Human Mortality: Mathematical and Tabular Representations of the Risk of Death. [D] . Sharrow, David J. 2013

机译：模拟人类死亡率的年龄模式：死亡风险的数学和表格表示。
6. Ontology patterns for tabular representations of biomedical knowledge on neglected tropical diseases [O] . Filipe Santana, Daniel Schober, Zulma Medeiros, -1

机译：关于被忽视的热带病的生物医学知识的表格表示的本体模式
7. Tabular Data Anomaly Patterns [O] . Dina Sukhobok, Nikolay Nikolov, Dumitru Roman 2017

机译：表格数据异常模式

Tabular Data Anomaly Patterns

摘要

著录项

相似文献

相关主题

期刊订阅