Uncovering the evolution history of data lakes

机译：发现数据湖的演变历史

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Data accumulating in data lakes can become inaccessible in the long run when its semantics are not available. The heterogeneity of data formats and the sheer volumes of data collections prohibit cleaning and unifying the data manually. Thus, tools for automated data lake analysis are of great interest. In this paper, we target the particular problem of reconstructing the schema evolution history from data lakes. Knowing how the data is structured, and how this structure has evolved over time, enables programmatic access to the lake. By deriving a sequence of schema versions, rather than a single schema, we take into account structural changes over time. Moreover, we address the challenge of detecting inclusion dependencies. This is a prerequisite for mapping between succeeding schema versions, and in particular, detecting nontrivial changes such as a property having been moved or copied. We evaluate our approach for detecting inclusion dependencies using the MovieLens dataset, as well an adaption of a dataset containing botanical descriptions, to cover specific edge cases.

机译：从长远来看，当数据湖中的数据语义不可用时，它们将变得不可访问。数据格式的异构性和庞大的数据收集量禁止手动清理和统一数据。因此，用于自动数据湖分析的工具引起了极大的兴趣。在本文中，我们针对从数据湖中重建模式演化历史的特定问题。了解数据的结构以及这种结构如何随着时间演变，可以通过编程方式访问湖泊。通过推导一系列模式版本而不是单个模式，我们考虑了一段时间内的结构变化。此外，我们解决了检测包含依赖性的挑战。这是在后续架构版本之间进行映射的先决条件，尤其是检测不重要的更改（例如，已移动或复制属性）的先决条件。我们评估了使用MovieLens数据集以及包含植物学描述的数据集的改编来检测包含依赖性的方法，以涵盖特定的边缘情况。

著录项

来源
《IEEE International Conference on Big Data》|2017年|2462-2471|共10页
会议地点
作者
Meike Klettke; Hannes Awolin; Uta Störl; Daniel Müller; Stefanie Scherzinger;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Protocols; Lakes; Grippers; NoSQL databases; History; Data mining; Tools;

机译：协议;湖;抓取器; NoSQL数据库;历史记录;数据挖掘;工具;

相似文献

外文文献
中文文献
专利

1. Phylogeny and Evolution of the Neotropical Radiation of Lachemilla (Rosaceae): Uncovering a History of Reticulate Evolution and Implications for Infrageneric Classification [J] . Systematic Botany . 2018,第1期

机译：Lachemilla（Rosaceae）的探测器的系统发育和演化：揭示了对InfraGeneric分类的网状演化历史和影响
2. Metabolic Diversity and Evolutionary History of the Archaeal Phylum “Candidatus Micrarchaeota” Uncovered from a Freshwater Lake Metagenome [J] . Vitaly V. Kadnikov, Alexander S. Savvichev, Andrey V. Mardanov, Applied Microbiology . 2020,第23期

机译：从淡水湖梅塔群岛露出的古代文学“Candidatus micrarcheota”的代谢多样性和进化史
3. Geomorphic history of Lake Manix, Mojave Desert, California: Evolution of a complex terminal lake basin [J] . Reheis Marith C., Miller David M., Paces James B., Geomorphology . 2021,第Nova1期

机译：曼西湖，莫哈韦沙漠，加利福尼亚州的地貌历史：复杂终端湖盆地的演变
4. Uncovering the evolution history of data lakes [C] . Meike Klettke, Hannes Awolin, Uta St?rl, IEEE International Conference on Big Data . 2017

机译：揭开数据湖泊的演变历史
5. Holocene sea-level history and the evolution of Sabine Lake and Calcasieu Lake; east Texas and west Louisiana, USA and the glacial retreat history of Maxwell Bay, South Shetland Islands, Antarctica: Implications for ice cap thickness, retreat, and climate change [D] . Milliken, Kristy Lynn Tramp 2008

机译：全新世海平面历史以及萨宾湖和卡萨西约湖的演变；美国东德克萨斯州和西路易斯安那州以及南极南设得兰群岛麦克斯韦湾的冰川退缩历史：对冰帽厚度，退缩和气候变化的影响
6. Genome Data Uncover Conservation Status Historical Relatedness and Candidate Genes Under Selection in Chinese Indigenous Pigs in the Taihu Lake Region [O] . Chenxi Liu, Pinghua Li, Wuduo Zhou, 2020

机译：太湖地区中国土着猪的选择下基因组数据揭示保护状态历史相关性和候选基因
7. Uncovering the dispersion history, adaptive evolution and selection of wheat in China [O] . Zhou Yong, Chen Zhongxu, Cheng Mengping, 2017

机译：揭示中国小麦的传播历史，适应性进化和选择

Uncovering the evolution history of data lakes

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅