Detecting unique column combinations on dynamic data

机译：检测动态数据的唯一列组合

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The discovery of all unique (and non-unique) column combinations in an unknown dataset is at the core of any data profiling effort. Unique column combinations resemble candidate keys of a relational dataset. Several research approaches have focused on their efficient discovery in a given, static dataset. However, none of these approaches are suitable for applications on dynamic datasets, such as transactional databases, social networks, and scientific applications. In these cases, data profiling techniques should be able to efficiently discover new uniques and non-uniques (and validate old ones) after tuple inserts or deletes, without re-profiling the entire dataset. We present the first approach to efficiently discover unique and non-unique constraints on dynamic datasets that is independent of the initial dataset size. In particular, Swan makes use of intelligently chosen indices to minimize access to old data. We perform an exhaustive analysis of Swan and compare it with two state-of-the-art techniques for unique discovery: Gordian and Ducc. The results show that Swan significantly outperforms both, as well as their incremental adaptations. For inserts, Swan is more than 63x faster than Gordian and up to 50x faster than Ducc. For deletes, Swan is more than 15x faster than Gordian and up to 1 order of magnitude faster than Ducc. In fact, Swan even improves on the static case by dividing the dataset into a static part and a set of inserts.

机译：在未知数据集中发现所有唯一（和非唯一）列组合是任何数据分析工作的核心。唯一列组合类似于关系数据集的候选键。几种研究方法集中于在给定的静态数据集中进行有效的发现。但是，这些方法都不适合在动态数据集上使用，例如事务数据库，社交网络和科学应用程序。在这些情况下，数据分析技术应该能够在元组插入或删除后有效地发现新的唯一性和非唯一性（并验证旧的），而无需重新分析整个数据集。我们提出了第一种方法，可以有效地发现动态数据集上的唯一约束和非唯一约束，而这些约束与初始数据集的大小无关。特别是，Swan利用智能选择的索引来最大程度地减少对旧数据的访问。我们对Swan进行了详尽的分析，并将其与两种独特发现的最新技术进行比较：Gordian和Ducc。结果表明，Swan以及它们的渐进式适应都明显胜过两者。对于插入件，Swan比Gordian快63倍以上，比Ducc快50倍。对于删除，Swan比Gordian快15倍以上，比Ducc快1个数量级。实际上，Swan甚至通过将数据集划分为静态部分和一组插入来改善静态情况。

著录项

来源
《IEEE international conference on data engineering》|2014年|1036-1047|共12页
会议地点
作者
Abedjan Ziawasch; Quiane-Ruiz Jorge-Arnulfo; Naumann Felix;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A combination of computational fluid dynamics (CFD) and adaptive neuro-fuzzy system (ANFIS) for prediction of the bubble column hydrodynamics [J] . Pourtousi M., Sahu J. N., Ganesan P., Powder Technology: An International Journal on the Science and Technology of Wet and Dry Particulate Systems . 2015,第Null期

机译：计算流体动力学（CFD）和自适应神经模糊系统（ANFIS）的组合，用于预测鼓泡塔的流体动力学
2. In situ Dynamics of O ₂, pH, Light, and Photosynthesis in Ikaite Tufa Columns (Ikka Fjord, Greenland)—A Unique Microbial Habitat [J] . Erik C. L. Trampe, Jens E. N. Larsen, Mikkel A. Glaring, Frontiers in Microbiology . 2016,第2期

机译：Ikaite Tufa色谱柱（格陵兰岛Ikka Fjord）中O _{2 ，pH，光和光合作用的原位动态-一种独特的微生物栖息地}
3. Unique Combination of 22q11 and 14qter Microdeletion Syndromes Detected Using Oligonucleotide Array-CGH [J] . P. Kuglík, R. Gaillyová, M. Vilémová, Molecular syndromology . 2012,第2期

机译：使用寡核苷酸阵列-CGH检测到的22q11和14qter微缺失综合征的独特组合
4. Detecting unique column combinations on dynamic data [C] . Abedjan Ziawasch, Quiane-Ruiz Jorge-Arnulfo, Naumann Felix IEEE international conference on data engineering . 2014

机译：检测动态数据上的唯一列组合
5. Comparative thermodynamic and environmental performance of a unique cogeneration power plan using operational data [D] . Domigan, Whitney E. 2010

机译：使用运行数据比较独特的热电联产发电计划的热力学和环境性能比较
6. In situ Dynamics of O2 pH Light and Photosynthesis in Ikaite Tufa Columns (Ikka Fjord Greenland)—A Unique Microbial Habitat [O] . Erik C. L. Trampe, Jens E. N. Larsen, Mikkel A. Glaring, -1

机译：Ikaite Tufa色谱柱（格陵兰岛Ikka峡湾）中O2pH光和光合作用的原位动力学-一种独特的微生物栖息地
7. The thermodynamic changes that occur upon mixing five models of formamide and three models of water, including the miscibility of these model combinations itself, is studied by performing Monte Carlo computer simulations using an appropriately chosen thermodynamic cycle and the method of thermodynamic integration. The results show that the mixing of these two components is close to the ideal mixing, as both the energy and entropy of mixing turn out to be rather close to the ideal term in the entire composition range. Concerning the energy of mixing, the OPLS/AA-mod model of formamide behaves in a qualitatively different way than the other models considered. Thus, this model results in negative, while the other ones in positive energy of mixing values in combination with all three water models considered. Experimental data supports this latter behavior. Although the Helmholtz free energy of mixing always turns out to be negative in the entire composition range, the majority of the model combinations tested either show limited miscibility, or, at least, approach the miscibility limit very closely in certain compositions. Concerning both the miscibility and the energy of mixing of these model combinations, we recommend the use of the combination of the CHARMM formamide and TIP4P water models in simulations of water-formamide mixtures. [O] . Kiss, Bálint, Fábián, Balázs, Idrissi, Abdenacer, 2017

机译：通过使用适当选择的热力学循环和热力学积分方法进行蒙特卡罗计算机模拟，研究了将五种甲酰胺模型和三种水模型混合在一起时发生的热力学变化，包括这些模型组合本身的可混溶性。结果表明这两种组分的混合接近于理想的混合，因为混合的能量和熵在整个组成范围内都非常接近理想的项。关于混合的能量，甲酰胺的OPLS / AA-mod模型与其他模型相比，在质量上有不同的表现。因此，该模型得出的结果是负的，而其他模型则综合考虑了所有三个水模型的结果的正能量。实验数据支持后一种行为。尽管混合的亥姆霍兹自由能在整个组成范围内始终为负，但大多数测试模型组合显示出有限的混溶性，或至少非常接近某些组合物的混溶性极限。关于这些模型组合的可混溶性和混合能量，我们建议在水-甲酰胺混合物的模拟中使用CHARMM甲酰胺和TIP4P水模型的组合。
8. Review of Experimental Capabilities and Hydrodynamic Data for Validation of CFD Based Predictions for Slurry Bubble Column Reactors. 2007 AIChE Annual Meeting [R] . Guillen, D. P., Wendt, D. S., Antal, S. P., 2007

机译：基于CFD的浆态鼓泡塔反应器预测的实验能力和水动力学数据综述。 2007年aIChE年会

Detecting unique column combinations on dynamic data

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅