首页> 外文会议>ACM/SIGOPS workshop on large-scale distributed systems and middleware >Parallel Bulk Insertion for Large-scale Analytics Applications
【24h】

Parallel Bulk Insertion for Large-scale Analytics Applications

机译:用于大规模分析应用的并行散装插入

获取原文

摘要

Modern data analytics applications, e.g. Internet-scale indexing, system trace analysis, recommender engines to name a few, operate on massive amounts of data and call for a parallel approach to data processing. In this work, we focus on the popular MapReduce framework to carry out such tasks and identify bulk data insert operations as a critical preliminary step to achieve reduced processing times, especially when new data is generated and processed at regular time intervals. We present a parallel approach to bulk data insertion in a system that use horizontally range partitioned data and evaluate several variants to insertion operations, including legacy approaches. Our method exploits the parallel processing framework itself to insert data into the system, which is stored in a semi-structured format. Our results indicate that a parallel approach to bulk insertion can substantially reduce the recurrent costs of insertion of new data into the system.
机译:现代数据分析应用,例如Internet-range索引,系统跟踪分析,推荐引擎的名称为几个,在大量数据上运行,并呼叫并行方法进行数据处理。在这项工作中,我们专注于流行的MapReduce框架来执行此类任务,并将批量数据插入操作识别为实现减少处理时间的关键初步步骤,尤其是当以规则的时间间隔生成并处理新数据时。我们在使用水平范围分区数据的系统中展示了一种平行的方法来批量数据插入,并评估若干变体以插入操作,包括传统方法。我们的方法利用并行处理框架本身将数据插入系统中,该系统以半结构化格式存储。我们的结果表明,批量插入的平行方法可以大大降低将新数据插入系统中的经常性成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号