...
首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Continuously Distinct Sampling over Centralized and Distributed High Speed Data Streams
【24h】

Continuously Distinct Sampling over Centralized and Distributed High Speed Data Streams

机译:在集中式和分布式高速数据流上连续进行不同采样

获取原文
获取原文并翻译 | 示例
           

摘要

Distinct sampling is fundamental for computing statistics (e.g., the age and gender distribution of distinct users accessing a particular website) depending on the set of distinct keys (e.g., user IDs) in a large and high speed data stream such as a sequence of key-update pairs. However, the major shortcoming of existing methods is their high computational cost incurred by determining whether each incoming key in the data stream is currently in the set of sampled keys and keeping track of sampled keys' update aggregations. To solve this challenge, we develop a new method random projection and eviction (RPE) that uses a list of buckets to continuously sample distinct keys and their update aggregations. RPE processes each key-update pair with small and nearly constant time complexity O(1). Besides centralized data streams, we also develop a novel method DRPE to deal with distributed data streams consisting of key-update pairs observed at multiple distributed sites. We conduct extensive experiments on real-world datasets, and the results demonstrate that RPE and DRPE reduce the memory, computational, and message costs of state-of-the-art methods by several times.
机译:独特的采样对于计算统计数据(例如,访问特定网站的不同用户的年龄和性别分布)是至关重要的,具体取决于大型和高速数据流(如密钥序列)中一组不同的密钥(例如,用户ID) -更新对。但是,现有方法的主要缺点是它们的高计算成本,因为确定数据流中的每个传入密钥当前是否都在采样密钥集中并跟踪采样密钥的更新聚合。为解决这一挑战,我们开发了一种新的随机投影和逐出(RPE)方法,该方法使用存储桶列表连续采样不同的键及其更新聚合。 RPE以较小且几乎恒定的时间复杂度O(1)处理每个密钥更新对。除了集中式数据流,我们还开发了一种新颖的DRPE方法来处理由在多个分布式站点上观察到的密钥更新对组成的分布式数据流。我们在现实世界的数据集上进行了广泛的实验,结果表明RPE和DRPE将最新方法的内存,计算和消息成本降低了数倍。

著录项

  • 来源
  • 作者单位

    Xi An Jiao Tong Univ, MOE Key Lab Intelligent Networks & Network Secur, POB 1088,28 Xianning West Rd, Xian 710049, Shaanxi, Peoples R China;

    Xi An Jiao Tong Univ, MOE Key Lab Intelligent Networks & Network Secur, POB 1088,28 Xianning West Rd, Xian 710049, Shaanxi, Peoples R China;

    Xi An Jiao Tong Univ, MOE Key Lab Intelligent Networks & Network Secur, POB 1088,28 Xianning West Rd, Xian 710049, Shaanxi, Peoples R China;

    Xi An Jiao Tong Univ, MOE Key Lab Intelligent Networks & Network Secur, POB 1088,28 Xianning West Rd, Xian 710049, Shaanxi, Peoples R China;

    Xi An Jiao Tong Univ, MOE Key Lab Intelligent Networks & Network Secur, POB 1088,28 Xianning West Rd, Xian 710049, Shaanxi, Peoples R China|Tsinghua Univ, Tsinghua Natl Lab Informat Sci & Technol, Beijing, Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Data stream; distinct sampling; sketch;

    机译:数据流;离散采样;草图;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号