Parallel Bulk Insertion for Large-scale Analytics Applications

机译：用于大规模分析应用程序的并行批量插入

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Modern data analytics applications, e.g. Internet-scale indexing, system trace analysis, recommender engines to name a few, operate on massive amounts of data and call for a parallel approach to data processing. In this work, we focus on the popular MapReduce framework to carry out such tasks and identify bulk data insert operations as a critical preliminary step to achieve reduced processing times, especially when new data is generated and processed at regular time intervals. We present a parallel approach to bulk data insertion in a system that use horizontally range partitioned data and evaluate several variants to insertion operations, including legacy approaches. Our method exploits the parallel processing framework itself to insert data into the system, which is stored in a semi-structured format. Our results indicate that a parallel approach to bulk insertion can substantially reduce the recurrent costs of insertion of new data into the system.

机译：现代数据分析应用程序，例如互联网规模的索引编制，系统跟踪分析，推荐引擎等等，对大量数据进行操作，并呼吁采用并行方法进行数据处理。在这项工作中，我们将重点放在流行的MapReduce框架上，以执行此类任务并确定批量数据插入操作，这是实现缩短处理时间的关键准备步骤，尤其是在以固定时间间隔生成和处理新数据时。我们提出了一种在系统中批量插入数据的并行方法，该系统使用水平范围分区的数据并评估了几种插入操作的变体，包括传统方法。我们的方法利用并行处理框架本身将数据插入系统，该数据以半结构格式存储。我们的结果表明，并行进行批量插入的方法可以大大减少将新数据插入系统的经常性成本。

著录项

来源
《4th ACM/SIGOPS workshop on large-scale distributed systems and middleware 2010》|2010年|p.27-31|共5页
会议地点 Zurich(CH);Zurich(CH)
作者
Antonio Barbuzzi; Ernst Biersack; Pietro Michiardi; Gennaro Boggia;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类程序设计、软件工程;
关键词
design; experimentation;

机译：设计；实验;

相似文献

外文文献
中文文献
专利

1. Three-level-parallelization support framework for large-scale analytic simulation [J] . Yao Yi-ping, Meng Dong, Zhu Feng, Journal of simulation . 2017,第3期

机译：大规模分析仿真的三级并行化支持框架
2. Efficient parallelization of analytic bond-order potentials for large-scale atomistic simulations [J] . Teijeiro C., Hammerschmidt T., Drautz R., Computer physics communications . 2016,第Null期

机译：大规模原子模拟的分析键序电位的有效并行化
3. Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer [J] . Suplatov Dmitry, Popova Nina, Zhumatiy Sergey, Journal of Bioinformatics and Computational Biology . 2016,第2期

机译：用于非并行生物信息学应用程序的并行工作流管理器，用于解决超级计算机上的大规模生物学问题
4. Parallel Bulk Insertion for Large-scale Analytics Applications [C] . Antonio Barbuzzi, Ernst Biersack, Pietro Michiardi, ACM/SIGOPS workshop on large-scale distributed systems and middleware . 2010

机译：用于大规模分析应用的并行散装插入
5. High frequency thin-film bulk acoustic wave resonators for gas- and bio-analytical applications [D] . Ashley, G. M. 2005

机译：用于气体和生物分析应用的高频薄膜体声波谐振器
6. Parallel continuous simulated tempering and its applications in large-scale molecular simulations [O] . Tianwu Zang, Linglin Yu, Chong Zhang, -1

机译：并行连续模拟回火及其在大规模分子模拟中的应用
7. Parallel Bulk Insertion for Large-scale Analytics Applications [O] . Antonio Barbuzzi, Politecnico Di Bari, Ernst Biersack, 2011

机译：大规模分析应用程序的并行批量插入

Parallel Bulk Insertion for Large-scale Analytics Applications

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅