首页> 外文会议>IEEE International Congress on Big Data >A memory capacity model for high performing data-filtering applications in Samza framework
【24h】

A memory capacity model for high performing data-filtering applications in Samza framework

机译:Samza框架高性能数据过滤应用的存储容量模型

获取原文

摘要

Data quality is essential in big data paradigm as poor data can have serious consequences when dealing with large volumes of data. While it is trivial to spot poor data for small-scale and offline use cases, it is challenging to detect and fix data inconsistency in large-scale and online (real-time or near-real time) big data context. An example of such scenario is spotting and fixing poor data using Apache Samza, a stream processing framework that has been increasingly adopted to process near-real-time data at LinkedIn. To optimize the deployment of Samza processing and reduce business cost, in this work we propose a memory capacity model for Apache Samza to allow denser deployments of high performing data-filtering applications built on Samza. The model can be used to provision just-enough memory resource to applications by tightening the bounds on the memory allocations. We apply our memory capacity model on Linkedln's real use cases in production, which significantly increases the deployment density and saves business costs. We will share key learning in this paper.
机译:数据质量在大数据范围中至关重要,因为在处理大量数据时,差的数据可能会产生严重后果。虽然在小规模和离线用例的数据差距离数据差,但是在大规模和在线(实时或近实时)大数据上下文中检测和修复数据不一致是挑战性的。这种情况的示例是使用Apache Samza的差别和修复差的数据,该流程处理框架已经越来越多地采用在LinkedIn下处理近实时数据。为了优化Samza处理的部署并降低业务成本,在这项工作中,我们提出了Apache Samza的内存容量模型,以允许在Samza上构建的高性能数据过滤应用程序的密度部署。该模型可用于通过收紧内存分配的界限为应用程序提供足够的内存资源。我们在Linkedln的实际用例上应用了我们的内存容量模型,这显着提高了部署密度并节省了业务成本。我们将分享本文的主要学习。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号