Boosting distinct random sampling for basic counting on the union of distributed streams

Xu Bojian

首页> 外文期刊>Theoretical computer science >Boosting distinct random sampling for basic counting on the union of distributed streams

【24h】

Boosting distinct random sampling for basic counting on the union of distributed streams

机译：促进独特的随机采样，以基本统计分布式流的并集

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We revisit the classic basic counting problem in the distributed streaming model. In the solution for maintaining an (epsilon,delta)-estimate, we make the following new contributions: (1) For a bit stream of size n, where each bit has a probability at least gamma to be 1, we exponentially reduced the average total processing time from the best prior work's Theta(n log(1/delta)) to O ((1/(gamma epsilon(2)))(log(2) n) log(1/delta)), thus providing the first sublinear-time streaming algorithm for this problem. (2) In addition to an overall much faster processing speed, our method provides a new tradeoff that a lower accuracy demand (a larger value for epsilon) promises a faster processing speed, whereas the best prior work's processing speed is Theta(n log(1/delta)) in any case and for any epsilon. (3) The worst-case total time cost of our method matches the best prior work's Theta(n log(1/delta)), which is necessary but rarely occurs in our method. (4) The space usage overhead in our method is a lower order term compared with the best prior work's space usage and occurs only O (logn) times during the stream processing and is too negligible to be detected by the OS in practice. We further validate these theoretical results with experiments on both real-world and synthetic data, showing that our method is faster than the best prior work by a factor of several to several hundreds depending on the stream size and accuracy demands, without any detectable space usage overhead. Our method is based on a faster sampling technique that we design for boosting the sampling procedure in the best prior work and we believe this technique can be of other independent interest. (C) 2015 Elsevier B.V. All rights reserved.

机译：我们将重新讨论分布式流模型中的经典基本计数问题。在维持（epsilon，delta）估计的解决方案中，我们做出了以下新贡献：（1）对于大小为n的比特流，其中每个比特的伽玛系数至少为1，我们按指数方式减小了平均值从最佳先前工作的Theta（n log（1 / delta））到O（（1 /（γepsilon（2）））（log（2）n）log（1 / delta））的总处理时间，从而提供了解决此问题的第一个亚线性时间流算法。（2）除了总体上要快得多的处理速度外，我们的方法还提供了一个新的权衡，即较低的精度要求（较大的epsilon值）将保证更快的处理速度，而最佳的先前工作的处理速度为Theta（n log（ 1 / delta））在任何情况下以及任何epsilon。（3）我们方法的最坏情况下的总时间成本与先前工作的最佳Theta（n log（1 / delta））相匹配，这是必要的，但在我们的方法中很少发生。（4）与最好的先前工作的空间使用相比，我们的方法中的空间使用开销是一个较低阶的术语，并且在流处理期间仅发生O（登录）次，并且在实践中几乎无法被OS检测到。我们通过对真实数据和合成数据的实验进一步验证了这些理论结果，表明我们的方法比最佳的现有工作要快几到几百倍，具体取决于流的大小和精度要求，而没有任何可检测的空间使用情况高架。我们的方法基于一种更快的采样技术，我们设计该技术是为了在最好的现有工作中增强采样过程，并且我们认为该技术可能会引起其他人们的关注。（C）2015 Elsevier B.V.保留所有权利。

著录项

来源
《Theoretical computer science 》 |2015年第null期| 共20页
作者
Xu Bojian;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术 ;
关键词
Basic counting; Data stream; Distributed streams; Coordinated adaptive sampling; Distinct sampling; Direct sampling;

机译：基本计数;数据流;分布式流;协调自适应采样;离散采样;直接采样;

相似文献

外文文献
中文文献
专利

1. Boosting distinct random sampling for basic counting on the union of distributed streams [J] . Xu Bojian Theoretical computer science . 2015 ,第Null期

机译：促进独特的随机采样，以基本统计分布式流的并集
2. Continuously Distinct Sampling over Centralized and Distributed High Speed Data Streams [J] . Wang Pinghui, Wang Xiangyu, Tao Jing, IEEE Transactions on Parallel and Distributed Systems . 2019 ,第2期

机译：在集中式和分布式高速数据流上连续进行不同采样
3. A Simple Message-Optimal Algorithm for Random Sampling from a Distributed Stream [J] . Yung-Yu Chung, Srikanta Tirthapura, David P. Woodruff IEEE Transactions on Knowledge and Data Engineering . 2016 ,第6期

机译：一种从分布式流中随机采样的简单消息优化算法
4. Distinct Random Sampling from a Distributed Stream [C] . Yung-Yu Chung, Tirthapura Srikanta IEEE International Parallel and Distributed Processing Symposium . 2015

机译：来自分布式流的不同随机采样
5. Improved Triangle Counting in Graph Streams: Neighborhood Multi-sampling [D] . Hanjani, Kiana Mousavi 2018

机译：改进的图形流中的三角形计数：邻域多重采样
6. Self-monitoring and personalized feedback based on the experiencing sampling method as a tool to boost depression treatment: a protocol of a pragmatic randomized controlled trial (ZELF-i) [O] . Jojanneke A. Bastiaansen, Maaike Meurs, Renee Stelwagen, 2018

机译：基于经验采样方法的自我监控和个性化反馈作为促进抑郁症治疗的工具：实用随机对照试验（ZELF-i）的方案
7. Boosting the Basic Counting on Distributed Streams [O] . Bojian Xu 2016

机译：提升分布式流的基本计数

Boosting distinct random sampling for basic counting on the union of distributed streams

摘要

著录项

相似文献

相关主题

期刊订阅