Automatic Discovery of Attributes in Relational Databases

机译：在关系数据库中自动发现属性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this work we design algorithms for clustering relational columns into attributes, i.e., for identifying strong relationships between columns based on the common properties and characteristics of the values they contain. For example, identifying whether a certain set of columns refers to telephone numbers versus social security numbers, or names of customers versus names of nations. Traditional relational database schema languages use very limited primitive data types and simple foreign key constraints to express relationships between columns. Object oriented schema languages allow the definition of custom data types; still, certain relationships between columns might be unknown at design time or they might appear only in a particular database instance. Nevertheless, these relationships are an invaluable tool for schema matching, and generally for better understanding and working with the data. Here, we introduce data oriented solutions (we do not consider solutions that assume the existence of any external knowledge) that use statistical measures to identify strong relationships between the values of a set of columns. Interpreting the database as a graph where nodes correspond to database columns and edges correspond to column relationships, we decompose the graph into connected components and cluster sets of columns into attributes. To test the quality of our solution, we also provide a comprehensive experimental evaluation using real and synthetic datasets.

机译：在这项工作中，我们设计了将关系列聚类为属性的算法，即用于基于列包含的值的通用属性和特征来识别列之间的强关系。例如，识别一组特定的列是指电话号码还是社会安全号码，还是客户名称还是国家/地区名称。传统的关系数据库模式语言使用非常有限的原始数据类型和简单的外键约束来表示列之间的关系。面向对象的模式语言允许定义自定义数据类型。尽管如此，列之间的某些关系在设计时可能还是未知的，或者它们可能仅出现在特定的数据库实例中。但是，这些关系是进行模式匹配的宝贵工具，通常可以更好地理解和使用数据。在这里，我们介绍了面向数据的解决方案（我们不考虑假定存在任何外部知识的解决方案），该解决方案使用统计量度来识别一组列的值之间的强关系。将数据库解释为图，其中节点对应于数据库列，而边对应于列关系，我们将图分解为连接的组件，并将列的集合集分解为属性。为了测试解决方案的质量，我们还使用真实和合成的数据集提供了全面的实验评估。

著录项

来源
《International conference on management of data》|2011年|109-120|共12页
会议地点
作者
Meihui Zhang; Marios Hadjieleftheriou; Beng Chin Ooi; Cecilia M. Procopiuc; Divesh Srivastava;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Attribute discovery; Schema matching;

机译：属性发现;模式匹配;

相似文献

外文文献
中文文献
专利

1. Automatic Discovery of Association Paths in Relational Databases Using Software Visualization | Science Publications [J] . Haider A. Ramadhan Journal of computer sciences . 2005,第3期

机译：使用软件可视化在关系数据库中自动发现关联路径科学出版物
2. A Novel Normalization Forms for Relational Database Design throughout Matching Related Data Attribute [J] . Youseef Alotaibi, Bashar Ramadan International Journal of Engineering and Manufacturing(IJEM) . 2017,第5期

机译：匹配相关数据属性的关系数据库设计新规范化形式
3. An attribute or tuple timestamping in bitemporal relational databases [J] . CANAN ATAY Turkish Journal of Electrical Engineering and Computer Sciences . 2016,第5期

机译：双时态关系数据库中的属性或元组时间戳
4. Automatic Discovery of Attributes in Relational Databases [C] . Meihui Zhang, Marios Hadjieleftheriou, Beng Chin Ooi, International conference on management of data . 2011

机译：自动发现关系数据库中的属性
5. Semi-Automatic Discovery of Meaningful Ontology from a Relational Database [D] . Witherspoon, David B. 2011

机译：从关系数据库中半自动发现有意义的本体
6. Automatic XQuery Generation and Generalized Visualization for an XML Interfaceto a Relational Database [O] . E. Sally Lee, Dan Suciu, James F. Brinkley 2005

机译：XML接口的自动XQuery生成和通用可视化到关系数据库
7. Automatic Discovery of Attributes in Relational Databases [O] . Meihui Zhang, Marios Hadjieleftheriou, Cecilia M. Procopiuc, 2012

机译：关系数据库中属性的自动发现
8. Attribute Partitioning in a Self-Adaptive Relational Database System. [R] . Niamir, B. 1978

机译：自适应关系数据库系统中的属性划分。

Automatic Discovery of Attributes in Relational Databases

摘要

著录项

相似文献

相关主题

期刊订阅