首页> 外文会议>International conference on management of data >Automatic Discovery of Attributes in Relational Databases
【24h】

Automatic Discovery of Attributes in Relational Databases

机译:在关系数据库中自动发现属性

获取原文

摘要

In this work we design algorithms for clustering relational columns into attributes, i.e., for identifying strong relationships between columns based on the common properties and characteristics of the values they contain. For example, identifying whether a certain set of columns refers to telephone numbers versus social security numbers, or names of customers versus names of nations. Traditional relational database schema languages use very limited primitive data types and simple foreign key constraints to express relationships between columns. Object oriented schema languages allow the definition of custom data types; still, certain relationships between columns might be unknown at design time or they might appear only in a particular database instance. Nevertheless, these relationships are an invaluable tool for schema matching, and generally for better understanding and working with the data. Here, we introduce data oriented solutions (we do not consider solutions that assume the existence of any external knowledge) that use statistical measures to identify strong relationships between the values of a set of columns. Interpreting the database as a graph where nodes correspond to database columns and edges correspond to column relationships, we decompose the graph into connected components and cluster sets of columns into attributes. To test the quality of our solution, we also provide a comprehensive experimental evaluation using real and synthetic datasets.
机译:在这项工作中,我们设计了将关系列聚类为属性的算法,即用于基于列包含的值的通用属性和特征来识别列之间的强关系。例如,识别一组特定的列是指电话号码还是社会安全号码,还是客户名称还是国家/地区名称。传统的关系数据库模式语言使用非常有限的原始数据类型和简单的外键约束来表示列之间的关系。面向对象的模式语言允许定义自定义数据类型。尽管如此,列之间的某些关系在设计时可能还是未知的,或者它们可能仅出现在特定的数据库实例中。但是,这些关系是进行模式匹配的宝贵工具,通常可以更好地理解和使用数据。在这里,我们介绍了面向数据的解决方案(我们不考虑假定存在任何外部知识的解决方案),该解决方案使用统计量度来识别一组列的值之间的强关系。将数据库解释为图,其中节点对应于数据库列,而边对应于列关系,我们将图分解为连接的组件,并将列的集合集分解为属性。为了测试解决方案的质量,我们还使用真实和合成的数据集提供了全面的实验评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号