首页> 外文OA文献 >FDTool: a Python application to mine for functional dependencies and candidate keys in tabular data
【2h】

FDTool: a Python application to mine for functional dependencies and candidate keys in tabular data

机译:FDTool:用于挖掘的Python应用程序,用于表格数据中的功能依赖关系和候选键

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Functional dependencies (FDs) and candidate keys are essential for table decomposition, database normalization, and data cleansing. In this paper, we present FDTool, a command line Python application to discover minimal FDs in tabular datasets and infer equivalent attribute sets and candidate keys from them. The runtime and memory costs associated with seven published FD discovery algorithms are given with an overview of their theoretical foundations. Previous research establishes that FD_Mine is the most efficient FD discovery algorithm when applied to datasets with many rows (> 100,000 rows) and few columns (< 14 columns). This puts it in a special position to rule mine clinical and demographic datasets, which often consist of long and narrow sets of participant records. The structure of FD_Mine is described and supplemented with a formal proof of the equivalence pruning method used. FDTool is a re-implementation of FD_Mine with additional features added to improve performance and automate typical processes in database architecture. The experimental results of applying FDTool to 13 datasets of different dimensions are summarized in terms of the number of FDs checked, the number of FDs found, and the time it takes for the code to terminate. We find that the number of attributes in a dataset has a much greater effect on the runtime and memory costs of FDTool than does row count. The last section explains in detail how the FDTool application can be accessed, executed, and further developed.
机译:功能依赖关系(FDS)和候选键对于表分解,数据库归一化和数据清理至关重要。在本文中,我们呈现FDTool,一个命令行Python应用程序,以发现表格数据集中的最小FDS和从它们中推断等效属性集和候选键。概述了与七个已发布的FD发现算法相关的运行时和内存成本,并概述了其理论基础。以前的研究建立了FD_MINE是应用于具有许多行(> 100,000行)和几列(<14列)的数据集时最有效的FD发现算法。这将其放在一个特殊的位置,以统治矿山临床和人口数据集,这些数据集通常由长期窄的参与者记录组成。描述了FD_MINE的结构,并补充了所用等效修剪方法的正式证明。 FDTool是FD_Mine的重新实现,其中添加了其他功能,可以提高数据库架构中的性能和自动化典型进程。将FDTool应用于不同尺寸的13个数据集的实验结果总结了所发现的FD的数量,找到的FD数量,以及代码终止所需的时间。我们发现数据集中的属性数量对FDTool的运行时和内存成本具有更大的影响而不是行计数。最后一节详细介绍了如何访问,执行和进一步开发的FDTool应用程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号