首页> 美国卫生研究院文献>other >FDTool: a Python application to mine for functional dependencies and candidate keys in tabular data

【2h】

FDTool: a Python application to mine for functional dependencies and candidate keys in tabular data

机译：FDTool：一个Python应用程序用于挖掘表格数据中的功能依赖性和候选键

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Functional dependencies (FDs) and candidate keys are essential for table decomposition, database normalization, and data cleansing. In this paper, we present FDTool, a command line Python application to discover minimal FDs in tabular datasets and infer equivalent attribute sets and candidate keys from them. The runtime and memory costs associated with seven published FD discovery algorithms are given with an overview of their theoretical foundations. Previous research establishes that FD_Mine is the most efficient FD discovery algorithm when applied to datasets with many rows (> 100,000 rows) and few columns (< 14 columns). This puts it in a special position to rule mine clinical and demographic datasets, which often consist of long and narrow sets of participant records. The structure of FD_Mine is described and supplemented with a formal proof of the equivalence pruning method used. FDTool is a re-implementation of FD_Mine with additional features added to improve performance and automate typical processes in database architecture. The experimental results of applying FDTool to 13 datasets of different dimensions are summarized in terms of the number of FDs checked, the number of FDs found, and the time it takes for the code to terminate. We find that the number of attributes in a dataset has a much greater effect on the runtime and memory costs of FDTool than does row count. The last section explains in detail how the FDTool application can be accessed, executed, and further developed.

机译：功能依赖项（FD）和候选键对于表分解，数据库规范化和数据清理至关重要。在本文中，我们介绍了FDTool，这是一个命令行Python应用程序，用于发现表格数据集中的最小FD并从中推断出等效的属性集和候选键。给出了与七个已发布的FD发现算法相关的运行时和内存成本，并概述了其理论基础。先前的研究确定，当FD_Mine应用于具有多行（> 100,000行）和少列（<14列）的数据集时，FD_Mine是最有效的FD发现算法。这使它在管理矿山临床和人口统计数据集方面处于特殊位置，这些数据集通常由长而窄的参与者记录集组成。描述并补充了FD_Mine的结构，并正式证明了所使用的等价修剪方法。 FDTool是FD_Mine的重新实现，添加了其他功能以提高性能并自动执行数据库体系结构中的典型流程。根据检查的FD数量，找到的FD数量以及代码终止所需的时间，总结了将FDTool应用于13个不同维度的数据集的实验结果。我们发现，数据集中的属性数量对FDTool的运行时间和内存成本的影响远大于行数。最后一部分详细说明了如何访问，执行和进一步开发FDTool应用程序。

著录项

期刊名称 other
作者
Matt Buranosky; Elmar Stellnberger; Emily Pfaff; David Diaz-Sanchez; Cavin Ward-Caviness; Sayan Mukherjee; Howard J. Hamilton; Shubhashis Shil;
展开▼
作者单位

展开▼
年(卷),期 -1(-1),-1
年度 -1
页码 -1
总页数 16
原文格式 PDF
正文语种
中图分类
关键词
Functional dependencies Data mining Electronic health records Relational database FDTool Rule discovery;

机译：功能依赖性;数据挖掘;电子健康记录;关系数据库;FDTool;规则发现;

相似文献

外文文献
中文文献
专利

1. An algorithm to mine general association rules from tabular data [J] . Ayubi S, Muyeba MK, Baraani A, Information Sciences: An International Journal . 2009,第20期

机译：从表格数据中挖掘一般关联规则的算法
2. Economics-Driven Data Management: An Application to the Design of Tabular Data Sets [J] . Even Adir, Shankaranarayanan G., Berger Paul D. IEEE Transactions on Knowledge and Data Engineering . 2007,第6期

机译：经济驱动的数据管理：在表格数据集设计中的应用
3. AutoCross: Automatic Feature Crossing for Tabular Data in Real-World Applications [J] . Yuanfei Luo, Mengshuo Wang, Hao Zhou, SIGKDD explorations . 2019,第Udisk期

机译：AutocroSO：现实应用中的表格数据自动特征交叉
4. FD/spl I.bar/Mine: discovering functional dependencies in a database using equivalences [C] . Hong Yao, Hamilton, H.J., . 2002

机译：FD / spl I.bar/Mine：使用等效项发现数据库中的功能依赖项
5. Reasoning about functional and key dependencies in hierarchically structured data. [D] . Hara, Carmem Satie. 2004

机译：关于层次结构化数据中功能和键依赖性的推理。
6. Compressing Tabular Data via Pairwise Dependencies [O] . Dmitri S. Pavlichin, Amir Ingber, Tsachy Weissman -1

机译：通过成对依赖性压缩表格数据
7. FDTool: a Python application to mine for functional dependencies and candidate keys in tabular data [O] . Matt Buranosky, Elmar Stellnberger, Emily Pfaff, 2019

机译：FDTool：用于挖掘的Python应用程序，用于表格数据中的功能依赖关系和候选键

FDTool: a Python application to mine for functional dependencies and candidate keys in tabular data

摘要

著录项

相似文献

相关主题

期刊订阅