首页> 外文会议>4th ACM/SIGOPS workshop on large-scale distributed systems and middleware 2010 >Parallel Bulk Insertion for Large-scale Analytics Applications
【24h】

Parallel Bulk Insertion for Large-scale Analytics Applications

机译:用于大规模分析应用程序的并行批量插入

获取原文
获取原文并翻译 | 示例

摘要

Modern data analytics applications, e.g. Internet-scale indexing, system trace analysis, recommender engines to name a few, operate on massive amounts of data and call for a parallel approach to data processing. In this work, we focus on the popular MapReduce framework to carry out such tasks and identify bulk data insert operations as a critical preliminary step to achieve reduced processing times, especially when new data is generated and processed at regular time intervals. We present a parallel approach to bulk data insertion in a system that use horizontally range partitioned data and evaluate several variants to insertion operations, including legacy approaches. Our method exploits the parallel processing framework itself to insert data into the system, which is stored in a semi-structured format. Our results indicate that a parallel approach to bulk insertion can substantially reduce the recurrent costs of insertion of new data into the system.
机译:现代数据分析应用程序,例如互联网规模的索引编制,系统跟踪分析,推荐引擎等等,对大量数据进行操作,并呼吁采用并行方法进行数据处理。在这项工作中,我们将重点放在流行的MapReduce框架上,以执行此类任务并确定批量数据插入操作,这是实现缩短处理时间的关键准备步骤,尤其是在以固定时间间隔生成和处理新数据时。我们提出了一种在系统中批量插入数据的并行方法,该系统使用水平范围分区的数据并评估了几种插入操作的变体,包括传统方法。我们的方法利用并行处理框架本身将数据插入系统,该数据以半结构格式存储。我们的结果表明,并行进行批量插入的方法可以大大减少将新数据插入系统的经常性成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号