首页> 外文会议>IEEE International Congress on Big Data >A memory capacity model for high performing data-filtering applications in Samza framework

【24h】

A memory capacity model for high performing data-filtering applications in Samza framework

机译：Samza框架中用于高性能数据过滤应用程序的内存容量模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data quality is essential in big data paradigm as poor data can have serious consequences when dealing with large volumes of data. While it is trivial to spot poor data for small-scale and offline use cases, it is challenging to detect and fix data inconsistency in large-scale and online (real-time or near-real time) big data context. An example of such scenario is spotting and fixing poor data using Apache Samza, a stream processing framework that has been increasingly adopted to process near-real-time data at LinkedIn. To optimize the deployment of Samza processing and reduce business cost, in this work we propose a memory capacity model for Apache Samza to allow denser deployments of high performing data-filtering applications built on Samza. The model can be used to provision just-enough memory resource to applications by tightening the bounds on the memory allocations. We apply our memory capacity model on Linkedln's real use cases in production, which significantly increases the deployment density and saves business costs. We will share key learning in this paper.

机译：数据质量在大数据范例中至关重要，因为当处理大量数据时，不良数据可能会造成严重后果。虽然在小型和脱机使用案例中发现不良数据很简单，但要在大型和在线（实时或近实时）大数据环境中检测并修复数据不一致的挑战是具有挑战性的。这种情况的一个例子是使用Apache Samza来发现和修复不良数据，Apache Samza是一种流处理框架，已被越来越多的人采用它来处理LinkedIn上的近实时数据。为了优化Samza处理的部署并降低业务成本，在这项工作中，我们提出了Apache Samza的内存容量模型，以允许更密集地部署基于Samza的高性能数据过滤应用程序。通过收紧内存分配的界限，该模型可用于为应用程序提供足够的内存资源。我们将内存容量模型应用于Linkedln在生产中的实际用例，从而大大提高了部署密度并节省了业务成本。我们将在本文中分享主要学习内容。

著录项

来源
《IEEE International Congress on Big Data》|2015年|2600-2605|共6页
会议地点
作者
Feng Tao; Zhuang Zhenyun; Pan Yi; Ramachandra Haricharan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Apache Samza; capacity model; data filtering; performance;

机译：Apache Samza;容量模型;数据过滤;性能;

相似文献

外文文献
中文文献
专利

1. Generalized Associative Memory Models: Their Memory Capacities and Potential Application [J] . Teddy N. Yap Jr., Arnulfo P. Azcarraga Journal of Advanced Computatioanl Intelligence and Intelligent Informatics . 2004,第1期

机译：广义关联内存模型：它们的存储器能力和潜在应用
2. An integrated modeling framework for performing environmental assessments: Application to ecosystem services in the Albemarle-Pamlico basins (NC and VA, USA) [J] . Johnston J.M., McGarvey D.J., Barber M.C., Ecological Modelling . 2011,第14期

机译：用于执行环境评估的集成建模框架：在Albemarle-Pamlico盆地（美国NC和VA）中应用于生态系统服务
3. A Chessboard Model of Human Brain and An Application on Memory Capacity [J] . Chenxia Gu, Shaotong Wang, Hao Yu Journal of Applied Mathematics and Physics . 2016,第2期

机译：人脑棋盘模型及其在记忆能力上的应用
4. A memory capacity model for high performing data-filtering applications in Samza framework [C] . Feng Tao, Zhuang Zhenyun, Pan Yi, IEEE International Congress on Big Data . 2015

机译：Samza框架高性能数据过滤应用的存储容量模型
5. Exploring the Processes Associated with Prospective Memory within the Frameworks of the Embedded-Processes Model of Working Memory and Long-Term Working Memory [D] . Underwood, Adam. 2019

机译：在工作记忆和长期工作记忆的嵌入式过程模型的框架内探索与预期记忆相关的过程
6. Being in the Past and Perform the Future in a Virtual World: VR Applications to Assess and Enhance Episodic and Prospective Memory in Normal and Pathological Aging [O] . Azzurra Rizzo, Giuditta Gambino, Pierangelo Sardo, 2020

机译：在过去并在虚拟世界中履行未来：VR应用程序以评估和增强正常和病理老化的情节和前瞻性记忆
7. Modeling Framework for Capacity Analysis of Freeway Segments: Application to Ramp Weaves [O] . Dezhong Xu, Nagui M. Rouphail, Behzad Aghdashi, 2020

机译：高速公路段产能分析建模框架：坡度扫描的应用

A memory capacity model for high performing data-filtering applications in Samza framework

摘要

著录项

相似文献

相关主题

期刊订阅