首页> 外文会议>International conference on very large data bases >Providing Streaming Joins as a Service at Facebook
【24h】

Providing Streaming Joins as a Service at Facebook

机译:在Facebook中提供流媒体加入作为服务

获取原文

摘要

Stream processing applications reduce the latency of batch data pipelines and enable engineers to quickly identify production issues. Many times, a service can log data to distinct streams, even if they relate to the same real-world event (e.g., a search on Facebook's search bar). Furthermore, the logging of related events can appear on the server side with different delay, causing one stream to be significantly behind the other in terms of logged event times for a given log entry. To be able to stitch this information together with low latency, we need to be able to join two different streams where each stream may have its own characteristics regarding the degree in which its data is out-of-order. Doing so in a streaming fashion is challenging as a join operator consumes lots of memory, especially with significant data volumes. This paper describes an end-to-end streaming join service that addresses the challenges above through a streaming join operator that uses an adaptive stream synchronization algorithm that is able to handle the different distributions we observe in real-world streams regarding their event times. This synchronization scheme paces the parsing of new data and reduces overall operator memory footprint while still providing high accuracy. We have integrated this into a streaming SQL system and have successfully reduced the latency of several batch pipelines using this approach.
机译:流处理应用程序减少批次数据管道的延迟,并使工程师能够快速识别生产问题。许多次,即使它们与相同的真实活动(例如,在Facebook的搜索栏上搜索)也可以将数据记录到不同的流中的数据。此外,相关事件的日志记录可以在服务器端上出现不同的延迟,在给定日志条目的记录事件时间方面使一个流显着。为了能够将该信息与低延迟一起缝合,我们需要能够加入两个不同的流,其中每个流可能具有其自身的特性,其关于其数据无序的程度。在流式时尚中这样做是挑战,因为加入运营商消耗了很多内存,尤其是具有重要数据量。本文介绍了一种端到端的流管理服务,其通过流式加入运算符来解决上面的挑战,该媒体连接运算符使用能够处理我们在实际流中观察到的关于其事件时间的不同分布的自适应流同步算法。该同步方案对新数据进行解析,并减少了整体操作员存储空间,同时仍提供高精度。我们已将其集成到流SQL系统中,并使用此方法成功降低了几个批量流水线的延迟。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号