首页> 外文会议>International conference on very large data bases >Kodiak: Leveraging Materialized Views For Very Low-Latency Analytics Over High-Dimensional Web-Scale Data
【24h】

Kodiak: Leveraging Materialized Views For Very Low-Latency Analytics Over High-Dimensional Web-Scale Data

机译:Kodiak:利用物化视图进行高维度Web规模数据的超低延迟分析

获取原文

摘要

Turn's online advertising campaigns produce petabytes of data. This data is composed of trillions of events, e.g. impressions, clicks, etc., spanning multiple years. In addition to a timestamp, each event includes hundreds of fields describing the user's attributes, campaign's attributes, attributes of where the ad was served, etc. Advertisers need advanced analytics to monitor their running campaigns' performance, as well as to optimize future campaigns. This involves slicing and dicing the data over tens of dimensions over arbitrary time ranges. Many of these queries need to power the web portal to provide reports and dashboards. For an interactive response time, they have to have tens of milliseconds latency. At Turn's scale of operations, no existing system was able to deliver this performance in a cost effective manner. Kodiak, a distributed analytical data platform for web-scale high-dimensional data, was built to serve this need. It relies on pre-computations to materialize thousands of views to serve these advanced queries. These views are partitioned and replicated across Kodiak's storage nodes for scalability and reliability. They are system maintained as new events arrive. At query time, the system auto-selects the most suitable view to serve each query. Kodiak has been used in production for over a year. It hosts 2490 views for over three petabytes of raw data serving over 200K queries daily. It has median and 99% query latencies of 8 ms and 252 ms respectively. Our experiments show that its query latency is 3 orders of magnitude faster than leading big data platforms on head-to-head comparisons using Turn's query workload. Moreover, Kodiak uses 4 orders of magnitude less resources to run the same workload.
机译:Turn的在线广告活动产生了PB级的数据。此数据由数万亿个事件组成,例如展示次数,点击次数等,跨越了多年。除了时间戳之外,每个事件还包括数百个字段,这些字段描述用户的属性,广告系列的属性,广告投放的位置的属性等。广告客户需要高级分析来监视其运行的广告系列的效果以及优化未来的广告系列。这涉及在任意时间范围内对数十个维度的数据进行切片和切块。这些查询中的许多查询都需要为Web门户提供动力以提供报告和仪表板。对于交互式响应时间,它们必须具有数十毫秒的延迟。在Turn的运营规模上,没有任何现有系统能够以经济高效的方式提供这种性能。 Kodiak是用于Web规模的高维数据的分布式分析数据平台,旨在满足这一需求。它依靠预计算来实现成千上万的视图来服务于这些高级查询。这些视图在Kodiak的存储节点上进行了分区和复制,以实现可伸缩性和可靠性。随着新事件的到来,对它们进行系统维护。在查询时,系统会自动选择最合适的视图来服务每个查询。 Kodiak在生产中已经使用了一年多。它每天托管超过3 PB的原始数据的2490个视图,每天可处理超过20万次查询。它具有8 ms和252 ms的中值查询延迟和99%的查询延迟。我们的实验表明,在使用Turn的查询工作量进行头对头比较时,其查询延迟比领先的大数据平台快3个数量级。此外,Kodiak可以减少4个数量级的资源来运行相同的工作负载。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号