首页> 外文会议>IEEE International Congress on Big Data >A memory capacity model for high performing data-filtering applications in Samza framework
【24h】

A memory capacity model for high performing data-filtering applications in Samza framework

机译:Samza框架中用于高性能数据过滤应用程序的内存容量模型

获取原文

摘要

Data quality is essential in big data paradigm as poor data can have serious consequences when dealing with large volumes of data. While it is trivial to spot poor data for small-scale and offline use cases, it is challenging to detect and fix data inconsistency in large-scale and online (real-time or near-real time) big data context. An example of such scenario is spotting and fixing poor data using Apache Samza, a stream processing framework that has been increasingly adopted to process near-real-time data at LinkedIn. To optimize the deployment of Samza processing and reduce business cost, in this work we propose a memory capacity model for Apache Samza to allow denser deployments of high performing data-filtering applications built on Samza. The model can be used to provision just-enough memory resource to applications by tightening the bounds on the memory allocations. We apply our memory capacity model on Linkedln's real use cases in production, which significantly increases the deployment density and saves business costs. We will share key learning in this paper.
机译:数据质量在大数据范例中至关重要,因为当处理大量数据时,不良数据可能会造成严重后果。虽然在小型和脱机使用案例中发现不良数据很简单,但要在大型和在线(实时或近实时)大数据环境中检测并修复数据不一致的挑战是具有挑战性的。这种情况的一个例子是使用Apache Samza来发现和修复不良数据,Apache Samza是一种流处理框架,已被越来越多的人采用它来处理LinkedIn上的近实时数据。为了优化Samza处理的部署并降低业务成本,在这项工作中,我们提出了Apache Samza的内存容量模型,以允许更密集地部署基于Samza的高性能数据过滤应用程序。通过收紧内存分配的界限,该模型可用于为应用程序提供足够的内存资源。我们将内存容量模型应用于Linkedln在生产中的实际用例,从而大大提高了部署密度并节省了业务成本。我们将在本文中分享主要学习内容。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号