Understanding Insights into the Basic Structure and Essential Issues of Table Placement Methods in Clusters

机译：了解集群中表放置方法的基本结构和基本问题

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

A table placement method is a critical component in big data analytics on distributed systems. It determines the way how data values in a two-dimensional table are organized and stored in the underlying cluster. Based on Hadoop computing environments, several table placement methods have been proposed and implemented. However, a comprehensive and systematic study to understand, to compare, and to evaluate different table placement methods has not been done. Thus, it is highly desirable to gain important insights into the basic structure and essential issues of table placement methods in the context of big data processing infrastructures. In this paper, we present such a study. The basic structure of a data placement method consists of three core operations: row reordering, table partitioning, and data packing. All the existing placement methods are formed by these core operations with variations made by the three key factors: (1) the size of a horizontal logical subset of a table (or the size of a row group), (2) the function of mapping columns to column groups, and (3) the function of packing columns or column groups in a row group into physical blocks. We have designed and implemented a benchmarking tool to provide insights into how variations of each factor affect the I/O performance of reading data of a table stored by a table placement method. Based on our results, we give suggested actions to optimize table reading performance. Results from large-scale experiments have also confirmed that our findings are valid for production workloads. Finally, we present ORC File as a case study to show the effectiveness of our findings and suggested actions.

机译：在分布式系统上的大数据分析中，表放置方法是至关重要的组件。它确定二维表中的数据值如何组织和存储在基础群集中的方式。基于Hadoop计算环境，已经提出并实现了几种表格放置方法。但是，尚未进行全面，系统的研究以了解，比较和评估不同的桌子放置方法。因此，非常需要在大数据处理基础架构中获得对表放置方法的基本结构和基本问题的重要见解。在本文中，我们提出了这样的研究。数据放置方法的基本结构包括三个核心操作：行重新排序，表分区和数据打包。这些核心操作形成了所有现有的放置方法，并通过三个关键因素进行了更改：（1）表的水平逻辑子集的大小（或行组的大小），（2）映射功能列到列组，以及（3）将行组中的列或列组打包为物理块的功能。我们已经设计并实现了一个基准测试工具，以深入了解每个因素的变化如何影响通过表放置方法存储的表的读取数据的I / O性能。根据我们的结果，我们提出了建议的操作以优化表读取性能。大规模实验的结果也证实了我们的发现对生产工作量是有效的。最后，我们以ORC文件为案例研究，以显示我们的发现和建议的措施的有效性。

著录项

来源
《International conference on very large data bases》|2013年|1750-1761|共12页
会议地点
作者
Yin Huai; Siyuan Ma; Rubao Lee; Owen OMalley; Xiaodong Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Ga2Te3 and Ga3Te2 clusters: understanding their structures, vibrational and energetic features using DFT and ab initio methods [J] . Neelum Seeburrun, Melissa M. J. Soopramanien, Hassan H. Abdallah, Journal of Materials Science . 2012,第10期

机译：Ga2 Te3 和Ga3 Te2 团簇：使用DFT和从头算方法了解它们的结构，振动和能量特征
2. Ga _2Te _3 and Ga _3Te _2 clusters: Understanding their structures, vibrational and energetic features using DFT and ab initio methods (Review) [J] . Seeburrun N., Soopramanien M.M.J., Abdallah H.H., Journal of Materials Science . 2012,第10期

机译：Ga _2Te _3和Ga _3Te _2团簇：使用DFT和从头算方法了解它们的结构，振动和能量特征（综述）
3. Understanding the Hydrogen-Bonded Clusters of Ammonia (NH3)n (n = 3–6): Insights from the Electronic Structure Theory [J] . Bo Wang, Pugeng Hou, Yongmao Cai, ACS Omega . 2020,第49期

机译：理解氨（NH3）N（n = 3-6）的氢键簇：电子结构理论的见解
4. Understanding Insights into the Basic Structure and Essential Issues of Table Placement Methods in Clusters [C] . Yin Huai, Siyuan Ma, Rubao Lee, International conference on very large data bases . 2013

机译：了解集群中表放置方法的基本结构和基本问题的见解
5. A theoretical study in extracting the essential features and dynamics of molecular motions: Intrinsic geometry methods for PF(5) pseudorotations and statistical methods for argon clusters. [D] . Panahi, Nima S. 2007

机译：提取分子运动的基本特征和动力学的理论研究：PF（5）伪旋转的本征几何方法和氩簇的统计方法。
6. Issues in structure-modifying osteoarthritis drug development: new insights regarding radiographic clinical trial methods [O] . JF Beary, GA Cline 2004

机译：结构修饰性骨关节炎药物开发中的问题：有关放射线临床试验方法的新见解
7. Understanding Insights into the Basic Structure and Essential Issues of Table Placement Methods in Clusters [O] . Yin Huai, Siyuan Ma, Rubao Lee, 2014

机译：理解对集群中表放置方法的基本结构和基本问题的见解

Understanding Insights into the Basic Structure and Essential Issues of Table Placement Methods in Clusters

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅