Parallel Bulk Insertion for Large-scale Analytics Applications

机译：用于大规模分析应用的并行散装插入

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Modern data analytics applications, e.g. Internet-scale indexing, system trace analysis, recommender engines to name a few, operate on massive amounts of data and call for a parallel approach to data processing. In this work, we focus on the popular MapReduce framework to carry out such tasks and identify bulk data insert operations as a critical preliminary step to achieve reduced processing times, especially when new data is generated and processed at regular time intervals. We present a parallel approach to bulk data insertion in a system that use horizontally range partitioned data and evaluate several variants to insertion operations, including legacy approaches. Our method exploits the parallel processing framework itself to insert data into the system, which is stored in a semi-structured format. Our results indicate that a parallel approach to bulk insertion can substantially reduce the recurrent costs of insertion of new data into the system.

机译：现代数据分析应用，例如Internet-range索引，系统跟踪分析，推荐引擎的名称为几个，在大量数据上运行，并呼叫并行方法进行数据处理。在这项工作中，我们专注于流行的MapReduce框架来执行此类任务，并将批量数据插入操作识别为实现减少处理时间的关键初步步骤，尤其是当以规则的时间间隔生成并处理新数据时。我们在使用水平范围分区数据的系统中展示了一种平行的方法来批量数据插入，并评估若干变体以插入操作，包括传统方法。我们的方法利用并行处理框架本身将数据插入系统中，该系统以半结构化格式存储。我们的结果表明，批量插入的平行方法可以大大降低将新数据插入系统中的经常性成本。

著录项

来源
《ACM/SIGOPS workshop on large-scale distributed systems and middleware》|2010年||共5页
会议地点
作者
Antonio Barbuzzi; Ernst Biersack; Pietro Michiardi; Gennaro Boggia;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词
design; experimentation;

机译：设计;实验;

相似文献

外文文献
中文文献
专利

1. Three-level-parallelization support framework for large-scale analytic simulation [J] . Yao Yi-ping, Meng Dong, Zhu Feng, Journal of simulation . 2017,第3期

机译：大规模分析仿真的三级并行化支持框架
2. Efficient parallelization of analytic bond-order potentials for large-scale atomistic simulations [J] . Teijeiro C., Hammerschmidt T., Drautz R., Computer physics communications . 2016,第Null期

机译：大规模原子模拟的分析键序电位的有效并行化
3. Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer [J] . Suplatov Dmitry, Popova Nina, Zhumatiy Sergey, Journal of Bioinformatics and Computational Biology . 2016,第2期

机译：用于非并行生物信息学应用程序的并行工作流管理器，用于解决超级计算机上的大规模生物学问题
4. Parallel Bulk Insertion for Large-scale Analytics Applications [C] . Antonio Barbuzzi, Ernst Biersack, Pietro Michiardi, 4th ACM/SIGOPS workshop on large-scale distributed systems and middleware 2010 . 2010

机译：用于大规模分析应用程序的并行批量插入
5. High frequency thin-film bulk acoustic wave resonators for gas- and bio-analytical applications [D] . Ashley, G. M. 2005

机译：用于气体和生物分析应用的高频薄膜体声波谐振器
6. Parallel continuous simulated tempering and its applications in large-scale molecular simulations [O] . Tianwu Zang, Linglin Yu, Chong Zhang, -1

机译：并行连续模拟回火及其在大规模分子模拟中的应用
7. Parallel Bulk Insertion for Large-scale Analytics Applications [O] . Antonio Barbuzzi, Politecnico Di Bari, Ernst Biersack, 2011

机译：大规模分析应用程序的并行批量插入

Parallel Bulk Insertion for Large-scale Analytics Applications

摘要

著录项

相似文献

相关主题

期刊订阅