首页> 外文会议>International conference on very large data bases >Providing Streaming Joins as a Service at Facebook
【24h】

Providing Streaming Joins as a Service at Facebook

机译:在Facebook提供流式加入即服务

获取原文

摘要

Stream processing applications reduce the latency of batch data pipelines and enable engineers to quickly identify production issues. Many times, a service can log data to distinct streams, even if they relate to the same real-world event (e.g., a search on Facebook's search bar). Furthermore, the logging of related events can appear on the server side with different delay, causing one stream to be significantly behind the other in terms of logged event times for a given log entry. To be able to stitch this information together with low latency, we need to be able to join two different streams where each stream may have its own characteristics regarding the degree in which its data is out-of-order. Doing so in a streaming fashion is challenging as a join operator consumes lots of memory, especially with significant data volumes. This paper describes an end-to-end streaming join service that addresses the challenges above through a streaming join operator that uses an adaptive stream synchronization algorithm that is able to handle the different distributions we observe in real-world streams regarding their event times. This synchronization scheme paces the parsing of new data and reduces overall operator memory footprint while still providing high accuracy. We have integrated this into a streaming SQL system and have successfully reduced the latency of several batch pipelines using this approach.
机译:流处理应用程序减少了批处理数据管道的延迟,并使工程师能够快速识别生产问题。很多时候,服务可以将数据记录到不同的流中,即使它们与同一个真实事件相关(例如,在Facebook搜索栏上的搜索)。此外,相关事件的日志记录可能会以不同的延迟出现在服务器端,从而导致一个流在给定日志条目的记录事件时间方面明显落后于其他流。为了能够以较低的延迟将这些信息拼接在一起,我们需要能够连接两个不同的流,其中每个流在其数据乱序的程度方面可能都有其自己的特征。以流方式执行此操作具有挑战性,因为联接运算符会消耗大量内存,尤其是大量数据时。本文介绍了一种端到端流联接服务,该服务通过流联接运算符解决了上述挑战,该运算符使用自适应流同步算法,该算法能够处理我们在现实流中观察到的有关事件时间的不同分布。这种同步方案加快了新数据的解析速度,减少了总体操作员内存占用,同时仍提供了高精度。我们已将此方法集成到流式SQL系统中,并已使用此方法成功减少了多个批处理管道的延迟。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号